From e6fb15b659b825f6d229bd30ec2604d51a759135 Mon Sep 17 00:00:00 2001 From: Irv Lustig Date: Sat, 28 Jun 2025 23:36:17 -0400 Subject: [PATCH 1/5] adding pandas.api.typing.aliases and docs --- .../development/contributing_codebase.rst | 3 +- doc/source/reference/aliases.rst | 91 +++++++++++++++++++ doc/source/reference/index.rst | 1 + doc/source/whatsnew/v3.0.0.rst | 1 + pandas/api/typing/__init__.py | 2 + pandas/api/typing/aliases.py | 0 6 files changed, 97 insertions(+), 1 deletion(-) create mode 100644 doc/source/reference/aliases.rst create mode 100644 pandas/api/typing/aliases.py diff --git a/doc/source/development/contributing_codebase.rst b/doc/source/development/contributing_codebase.rst index 73bc756de9302..a2417616cf80a 100644 --- a/doc/source/development/contributing_codebase.rst +++ b/doc/source/development/contributing_codebase.rst @@ -214,7 +214,8 @@ With custom types and inference this is not always possible so exceptions are ma pandas-specific types ~~~~~~~~~~~~~~~~~~~~~ -Commonly used types specific to pandas will appear in `pandas._typing `_ and you should use these where applicable. This module is private for now but ultimately this should be exposed to third party libraries who want to implement type checking against pandas. +Commonly used types specific to pandas will appear in `pandas._typing `__ and you should use these where applicable. This module is private and is meant for pandas development. +Types that are meant for user consumption should be exposed in `pandas.api.typing.aliases `__ and ideally added to the `pandas-stubs `__ project. For example, quite a few functions in pandas accept a ``dtype`` argument. This can be expressed as a string like ``"object"``, a ``numpy.dtype`` like ``np.int64`` or even a pandas ``ExtensionDtype`` like ``pd.CategoricalDtype``. Rather than burden the user with having to constantly annotate all of those options, this can simply be imported and reused from the pandas._typing module diff --git a/doc/source/reference/aliases.rst b/doc/source/reference/aliases.rst new file mode 100644 index 0000000000000..1bb081db89d78 --- /dev/null +++ b/doc/source/reference/aliases.rst @@ -0,0 +1,91 @@ +{{ header }} + +.. _api.typing.aliases: + +====================================== +pandas typing aliases +====================================== + +************** +Typing aliases +************** + +.. currentmodule:: pandas.api.atyping.aliases + +The typing declarations in ``pandas/_typing.py`` are considered private, and used +by pandasdevelopers for type checking of the pandascode base. For users, it is +highly recommended to use the ``pandas-stubs`` package that represents the officially +supported type declarations for users of pandas. +Note that the definitions and use cases of these aliases are subject to change. +They are documented here for users who wish to use these declarations in their +own python code that calls pandasor expects certain results. + +Each of these aliases listed in the table below can be found by importing them from :py:mod:`pandas.api.typing.aliases`. + +==================================== ================================================================ +Alias Meaning +==================================== ================================================================ +:py:type:`AggFuncType` Type of functions that can be passed to :meth:`agg` methods +:py:type:`AlignJoin` Argument type for ``join`` in :meth:`DataFrame.join` +:py:type:`AnyAll` Argument type for ``how`` in :meth:`dropna` +:py:type:`AnyArrayLike` Used to represent :class:`ExtensionArray`, ``numpy`` arrays, :class:`Index` and :class:`Series` +:py:type:`ArrayLike` Used to represent :class:`ExtensionArray`, ``numpy`` arrays +:py:type:`AstypeArg` Argument type in :meth:`astype` +:py:type:`Axes` :py:type:`AnyArrayLike` plus sequences (not strings) and ``range`` +:py:type:`Axis` Argument type for ``axis`` in many methods +:py:type:`CSVEngine` Argument type for ``engine`` in :meth:`DataFrame.read_csv` +:py:type:`ColspaceArgType` Argument type for ``colspace`` in :meth:`DataFrame.to_html` +:py:type:`CompressionOptions` Argument type for ``compression`` in many I/O output methods +:py:type:`CorrelationMethod` Argument type for ``correlation`` in :meth:`corr` +:py:type:`DropKeep` Argument type for ``keep`` in :meth:`drop_duplicates` +:py:type:`Dtype` Types as objects that can be used to specify dtypes +:py:type:`DtypeArg` Argument type for ``dtype`` in various methods +:py:type:`DtypeBackend` Argument type for ``dtype_backend`` in various methods +:py:type:`DtypeObj` Numpy dtypes and Extension dtypes +:py:type:`ExcelWriterIfSheetExists` Argument type for ``if_sheet_exists`` in :class:`ExcelWriter` +:py:type:`ExcelWriterMergeCells` Argument type for ``merge_cells`` in :meth:`to_excel` +:py:type:`FilePath` Type of paths for files for I/O methods +:py:type:`FillnaOptions` Argument type for ``method`` in various methods where NA values are filled +:py:type:`FloatFormatType` Argument type for ``float_format`` in :meth:`to_string` +:py:type:`FormattersType` Argument type for ``formatters`` in :meth:`to_string` +:py:type:`FromDictOrient` Argument type for ``orient`` in :meth:`DataFrame.from_dict` +:py:type:`HTMLFlavors` Argument type for ``flavor`` in :meth:`pandas.read_html` +:py:type:`IgnoreRaise` Argument type for ``errors`` in multiple methods +:py:type:`IndexLabel` Argument type for ``level`` in multiple methods +:py:type:`InterpolateOptions` Argument type for ``interpolate`` in :meth:`interpolate` +:py:type:`JSONEngine` Argument type for ``engine`` in :meth:`read_json` +:py:type:`JSONSerializable` Argument type for the return type of a callable for argument ``default_handler`` in :meth:`to_json` +:py:type:`JoinHow` Argument type for ``how`` in :meth:`pandas.merge_ordered` and for ``join`` in :meth:`Series.align` +:py:type:`JoinValidate` Argument type for ``validate`` in :meth:`DataFrame.join` +:py:type:`MergeHow` Argument type for ``how`` in :meth:`merge` +:py:type:`MergeValidate` Argument type for ``validate`` in :meth:`merge` +:py:type:`NaPosition` Argument type for ``na_position`` in :meth:`sort_index` and :meth:`sort_values` +:py:type:`NsmallestNlargestKeep` Argument type for ``keep`` in :meth:`nlargest` and :meth:`nsmallest` +:py:type:`OpenFileErrors` Argument type for ``errors`` in :meth:`to_hdf` and :meth:`to_csv` +:py:type:`Ordered` Return type for :py:attr:`ordered`` in :class:`CategoricalDtype` and :class:`Categorical` +:py:type:`QuantileInterpolation` Argument type for ``interpolation`` in :meth:`quantile` +:py:type:`ReadBuffer` Additional argument type corresponding to buffers for various file reading methods +:py:type:`ReadCsvBuffer` Additional argument type corresponding to buffers for :meth:`pandas.read_csv` +:py:type:`ReadPickleBuffer` Additional argument type corresponding to buffers for :meth:`pandas.read_pickle` +:py:type:`ReindexMethod` Argument type for ``reindex`` in :meth:`reindex` +:py:type:`Scalar` Basic type that can be stored in :class:`Series` +:py:type:`SequenceNotStr` Used for arguments that require sequences, but not plain strings +:py:type:`SortKind` Argument type for ``kind`` in :meth:`sort_index` and :meth:`sort_values` +:py:type:`StorageOptions` Argument type for ``storage_options`` in various file output methods +:py:type:`Suffixes` Argument type for ``suffixes`` in :meth:`merge`, :meth:`compare` and :meth:`merge_ordered` +:py:type:`TakeIndexer` Argument type for ``indexer`` and ``indices`` in :meth:`take` +:py:type:`TimeAmbiguous` Argument type for ``ambiguous`` in time operations +:py:type:`TimeGrouperOrigin` Argument type for ``origin`` in :meth:`resample` and :class:`TimeGrouper` +:py:type:`TimeNonexistent` Argument type for ``nonexistent`` in time operations +:py:type:`TimeUnit` Time unit argument and return type for :py:attr:`unit`, arguments ``unit`` and ``date_unit`` +:py:type:`TimedeltaConvertibleTypes` Argument type for ``offset`` in :meth:`resample`, ``halflife`` in :meth:`ewm` and ``start`` and ``end`` in :meth:`pandas.timedelta_range` +:py:type:`TimestampConvertibleTypes` Argument type for ``origin`` in :meth:`resample` and :meth:`pandas.to_datetime` +:py:type:`ToStataByteorder` Argument type for ``byteorder`` in :meth:`DataFrame.to_stata` +:py:type:`ToTimestampHow` Argument type for ``how`` in :meth:`to_timestamp` and ``convention`` in :meth:`resample` +:py:type:`UpdateJoin` Argument type for ``join`` in :meth:`DataFrame.update` +:py:type:`UsecolsArgType` Argument type for ``usecols`` in :meth:`pandas.read_clipboard`, :meth:`pandas.read_csv` and :meth:`pandas.read_excel` +:py:type:`WindowingRankType` Argument type for ``method`` in :meth:`rank`` in rolling and expanding window operations +:py:type:`WriteBuffer` Additional argument type corresponding to buffers for various file output methods +:py:type:`WriteExcelBuffer` Additional argument type corresponding to buffers for :meth:`to_excel` +:py:type:`XMLParsers` Argument type for ``parser`` in :meth:`DataFrame.to_xml` and :meth:`pandas.read_xml` +==================================== ================================================================ diff --git a/doc/source/reference/index.rst b/doc/source/reference/index.rst index 639bac4d40b70..ec9e3c1bad476 100644 --- a/doc/source/reference/index.rst +++ b/doc/source/reference/index.rst @@ -55,6 +55,7 @@ to be stable. extensions testing missing_value + aliases .. This is to prevent warnings in the doc build. We don't want to encourage .. these methods. diff --git a/doc/source/whatsnew/v3.0.0.rst b/doc/source/whatsnew/v3.0.0.rst index 8d3ac0e396430..996a649d4f772 100644 --- a/doc/source/whatsnew/v3.0.0.rst +++ b/doc/source/whatsnew/v3.0.0.rst @@ -83,6 +83,7 @@ Other enhancements - Add ``"delete_rows"`` option to ``if_exists`` argument in :meth:`DataFrame.to_sql` deleting all records of the table before inserting data (:issue:`37210`). - Added half-year offset classes :class:`HalfYearBegin`, :class:`HalfYearEnd`, :class:`BHalfYearBegin` and :class:`BHalfYearEnd` (:issue:`60928`) - Added support to read and write from and to Apache Iceberg tables with the new :func:`read_iceberg` and :meth:`DataFrame.to_iceberg` functions (:issue:`61383`) +- Certain aliases from :py:mode:`pandas._typing` are now exposed in :py:mod:`pandas.api.typing.aliases` (:issue:`55231`) - Errors occurring during SQL I/O will now throw a generic :class:`.DatabaseError` instead of the raw Exception type from the underlying driver manager library (:issue:`60748`) - Implemented :meth:`Series.str.isascii` and :meth:`Series.str.isascii` (:issue:`59091`) - Improved deprecation message for offset aliases (:issue:`60820`) diff --git a/pandas/api/typing/__init__.py b/pandas/api/typing/__init__.py index c1178c72f3edc..f329ee1fe9930 100644 --- a/pandas/api/typing/__init__.py +++ b/pandas/api/typing/__init__.py @@ -6,6 +6,7 @@ from pandas._libs.lib import NoDefault from pandas._libs.missing import NAType +from pandas.api.typing import aliases from pandas.core.groupby import ( DataFrameGroupBy, SeriesGroupBy, @@ -56,4 +57,5 @@ "TimeGrouper", "TimedeltaIndexResamplerGroupby", "Window", + "aliases", ] diff --git a/pandas/api/typing/aliases.py b/pandas/api/typing/aliases.py new file mode 100644 index 0000000000000..e69de29bb2d1d From 1cdef767a4698c8d94b5960cd2653595b07cc6f0 Mon Sep 17 00:00:00 2001 From: Irv Lustig Date: Sun, 29 Jun 2025 10:05:57 -0400 Subject: [PATCH 2/5] create aliases file and update tests --- pandas/api/typing/__init__.py | 2 - pandas/api/typing/aliases.py | 131 ++++++++++++++++++++++++++++++++++ pandas/tests/api/test_api.py | 71 +++++++++++++++++- 3 files changed, 201 insertions(+), 3 deletions(-) diff --git a/pandas/api/typing/__init__.py b/pandas/api/typing/__init__.py index f329ee1fe9930..c1178c72f3edc 100644 --- a/pandas/api/typing/__init__.py +++ b/pandas/api/typing/__init__.py @@ -6,7 +6,6 @@ from pandas._libs.lib import NoDefault from pandas._libs.missing import NAType -from pandas.api.typing import aliases from pandas.core.groupby import ( DataFrameGroupBy, SeriesGroupBy, @@ -57,5 +56,4 @@ "TimeGrouper", "TimedeltaIndexResamplerGroupby", "Window", - "aliases", ] diff --git a/pandas/api/typing/aliases.py b/pandas/api/typing/aliases.py index e69de29bb2d1d..77e3cccb51251 100644 --- a/pandas/api/typing/aliases.py +++ b/pandas/api/typing/aliases.py @@ -0,0 +1,131 @@ +from pandas._typing import ( + AggFuncType, + AlignJoin, + AnyAll, + AnyArrayLike, + ArrayLike, + AstypeArg, + Axes, + Axis, + ColspaceArgType, + CompressionOptions, + CorrelationMethod, + CSVEngine, + DropKeep, + Dtype, + DtypeArg, + DtypeBackend, + DtypeObj, + ExcelWriterIfSheetExists, + ExcelWriterMergeCells, + FilePath, + FillnaOptions, + FloatFormatType, + FormattersType, + FromDictOrient, + HTMLFlavors, + IgnoreRaise, + IndexLabel, + InterpolateOptions, + JoinHow, + JoinValidate, + JSONEngine, + JSONSerializable, + MergeHow, + MergeValidate, + NaPosition, + NsmallestNlargestKeep, + OpenFileErrors, + Ordered, + QuantileInterpolation, + ReadBuffer, + ReadCsvBuffer, + ReadPickleBuffer, + ReindexMethod, + Scalar, + SequenceNotStr, + SortKind, + StorageOptions, + Suffixes, + TakeIndexer, + TimeAmbiguous, + TimedeltaConvertibleTypes, + TimeGrouperOrigin, + TimeNonexistent, + TimestampConvertibleTypes, + TimeUnit, + ToStataByteorder, + ToTimestampHow, + UpdateJoin, + UsecolsArgType, + WindowingRankType, + WriteBuffer, + WriteExcelBuffer, + XMLParsers, +) + +__all__ = [ + "AggFuncType", + "AlignJoin", + "AnyAll", + "AnyArrayLike", + "ArrayLike", + "AstypeArg", + "Axes", + "Axis", + "CSVEngine", + "ColspaceArgType", + "CompressionOptions", + "CorrelationMethod", + "DropKeep", + "Dtype", + "DtypeArg", + "DtypeBackend", + "DtypeObj", + "ExcelWriterIfSheetExists", + "ExcelWriterMergeCells", + "FilePath", + "FillnaOptions", + "FloatFormatType", + "FormattersType", + "FromDictOrient", + "HTMLFlavors", + "IgnoreRaise", + "IndexLabel", + "InterpolateOptions", + "JSONEngine", + "JSONSerializable", + "JoinHow", + "JoinValidate", + "MergeHow", + "MergeValidate", + "NaPosition", + "NsmallestNlargestKeep", + "OpenFileErrors", + "Ordered", + "QuantileInterpolation", + "ReadBuffer", + "ReadCsvBuffer", + "ReadPickleBuffer", + "ReindexMethod", + "Scalar", + "SequenceNotStr", + "SortKind", + "StorageOptions", + "Suffixes", + "TakeIndexer", + "TimeAmbiguous", + "TimeGrouperOrigin", + "TimeNonexistent", + "TimeUnit", + "TimedeltaConvertibleTypes", + "TimestampConvertibleTypes", + "ToStataByteorder", + "ToTimestampHow", + "UpdateJoin", + "UsecolsArgType", + "WindowingRankType", + "WriteBuffer", + "WriteExcelBuffer", + "XMLParsers", +] diff --git a/pandas/tests/api/test_api.py b/pandas/tests/api/test_api.py index 871e977cbe2f8..453b4b4fffffd 100644 --- a/pandas/tests/api/test_api.py +++ b/pandas/tests/api/test_api.py @@ -13,6 +13,7 @@ types as api_types, typing as api_typing, ) +from pandas.api.typing import aliases as api_aliases class Base: @@ -251,7 +252,6 @@ class TestApi(Base): "indexers", "interchange", "typing", - "internals", ] allowed_typing = [ "DataFrameGroupBy", @@ -275,6 +275,7 @@ class TestApi(Base): "TimedeltaIndexResamplerGroupby", "TimeGrouper", "Window", + "aliases", ] allowed_api_types = [ "is_any_real_numeric_dtype", @@ -342,6 +343,71 @@ class TestApi(Base): "ExtensionScalarOpsMixin", ] allowed_api_executors = ["BaseExecutionEngine"] + allowed_api_aliases = [ + "AggFuncType", + "AlignJoin", + "AnyAll", + "AnyArrayLike", + "ArrayLike", + "AstypeArg", + "Axes", + "Axis", + "CSVEngine", + "ColspaceArgType", + "CompressionOptions", + "CorrelationMethod", + "DropKeep", + "Dtype", + "DtypeArg", + "DtypeBackend", + "DtypeObj", + "ExcelWriterIfSheetExists", + "ExcelWriterMergeCells", + "FilePath", + "FillnaOptions", + "FloatFormatType", + "FormattersType", + "FromDictOrient", + "HTMLFlavors", + "IgnoreRaise", + "IndexLabel", + "InterpolateOptions", + "JSONEngine", + "JSONSerializable", + "JoinHow", + "JoinValidate", + "MergeHow", + "MergeValidate", + "NaPosition", + "NsmallestNlargestKeep", + "OpenFileErrors", + "Ordered", + "QuantileInterpolation", + "ReadBuffer", + "ReadCsvBuffer", + "ReadPickleBuffer", + "ReindexMethod", + "Scalar", + "SequenceNotStr", + "SortKind", + "StorageOptions", + "Suffixes", + "TakeIndexer", + "TimeAmbiguous", + "TimeGrouperOrigin", + "TimeNonexistent", + "TimeUnit", + "TimedeltaConvertibleTypes", + "TimestampConvertibleTypes", + "ToStataByteorder", + "ToTimestampHow", + "UpdateJoin", + "UsecolsArgType", + "WindowingRankType", + "WriteBuffer", + "WriteExcelBuffer", + "XMLParsers", + ] def test_api(self): self.check(api, self.allowed_api_dirs) @@ -364,6 +430,9 @@ def test_api_extensions(self): def test_api_executors(self): self.check(api_executors, self.allowed_api_executors) + def test_api_typing_aliases(self): + self.check(api_aliases, self.allowed_api_aliases) + class TestErrors(Base): def test_errors(self): From c45b291c7d3af1d8fd3749cf29eb1a0746b498d6 Mon Sep 17 00:00:00 2001 From: Irv Lustig Date: Sun, 29 Jun 2025 12:22:11 -0400 Subject: [PATCH 3/5] put internals back in test even though it fails on --- pandas/tests/api/test_api.py | 1 + 1 file changed, 1 insertion(+) diff --git a/pandas/tests/api/test_api.py b/pandas/tests/api/test_api.py index 453b4b4fffffd..cb4559ad248a3 100644 --- a/pandas/tests/api/test_api.py +++ b/pandas/tests/api/test_api.py @@ -252,6 +252,7 @@ class TestApi(Base): "indexers", "interchange", "typing", + "internals", ] allowed_typing = [ "DataFrameGroupBy", From cfafd5550d00ae6d05ec8127082e4324dd2fdb49 Mon Sep 17 00:00:00 2001 From: Irv Lustig Date: Sun, 29 Jun 2025 12:57:32 -0400 Subject: [PATCH 4/5] fix spelling issue in whatsnew --- doc/source/whatsnew/v3.0.0.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/whatsnew/v3.0.0.rst b/doc/source/whatsnew/v3.0.0.rst index 996a649d4f772..5affd808bfb30 100644 --- a/doc/source/whatsnew/v3.0.0.rst +++ b/doc/source/whatsnew/v3.0.0.rst @@ -83,7 +83,7 @@ Other enhancements - Add ``"delete_rows"`` option to ``if_exists`` argument in :meth:`DataFrame.to_sql` deleting all records of the table before inserting data (:issue:`37210`). - Added half-year offset classes :class:`HalfYearBegin`, :class:`HalfYearEnd`, :class:`BHalfYearBegin` and :class:`BHalfYearEnd` (:issue:`60928`) - Added support to read and write from and to Apache Iceberg tables with the new :func:`read_iceberg` and :meth:`DataFrame.to_iceberg` functions (:issue:`61383`) -- Certain aliases from :py:mode:`pandas._typing` are now exposed in :py:mod:`pandas.api.typing.aliases` (:issue:`55231`) +- Certain aliases from :py:mod:`pandas._typing` are now exposed in :py:mod:`pandas.api.typing.aliases` (:issue:`55231`) - Errors occurring during SQL I/O will now throw a generic :class:`.DatabaseError` instead of the raw Exception type from the underlying driver manager library (:issue:`60748`) - Implemented :meth:`Series.str.isascii` and :meth:`Series.str.isascii` (:issue:`59091`) - Improved deprecation message for offset aliases (:issue:`60820`) From f4804d39f2a50cdb392bb577b9ece14efd0dbfe6 Mon Sep 17 00:00:00 2001 From: Irv Lustig Date: Tue, 1 Jul 2025 11:57:59 -0400 Subject: [PATCH 5/5] implemented doc fixes --- doc/source/reference/aliases.rst | 4 ++-- doc/source/whatsnew/v3.0.0.rst | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/source/reference/aliases.rst b/doc/source/reference/aliases.rst index 1bb081db89d78..b596c97bc741d 100644 --- a/doc/source/reference/aliases.rst +++ b/doc/source/reference/aliases.rst @@ -13,12 +13,12 @@ Typing aliases .. currentmodule:: pandas.api.atyping.aliases The typing declarations in ``pandas/_typing.py`` are considered private, and used -by pandasdevelopers for type checking of the pandascode base. For users, it is +by pandas developers for type checking of the pandas code base. For users, it is highly recommended to use the ``pandas-stubs`` package that represents the officially supported type declarations for users of pandas. Note that the definitions and use cases of these aliases are subject to change. They are documented here for users who wish to use these declarations in their -own python code that calls pandasor expects certain results. +own python code that calls pandas or expects certain results. Each of these aliases listed in the table below can be found by importing them from :py:mod:`pandas.api.typing.aliases`. diff --git a/doc/source/whatsnew/v3.0.0.rst b/doc/source/whatsnew/v3.0.0.rst index 65ee4d20a32b3..6925e13b4a9b6 100644 --- a/doc/source/whatsnew/v3.0.0.rst +++ b/doc/source/whatsnew/v3.0.0.rst @@ -86,10 +86,10 @@ Other enhancements - Add ``"delete_rows"`` option to ``if_exists`` argument in :meth:`DataFrame.to_sql` deleting all records of the table before inserting data (:issue:`37210`). - Added half-year offset classes :class:`HalfYearBegin`, :class:`HalfYearEnd`, :class:`BHalfYearBegin` and :class:`BHalfYearEnd` (:issue:`60928`) - Added support to read and write from and to Apache Iceberg tables with the new :func:`read_iceberg` and :meth:`DataFrame.to_iceberg` functions (:issue:`61383`) -- Certain aliases from :py:mod:`pandas._typing` are now exposed in :py:mod:`pandas.api.typing.aliases` (:issue:`55231`) - Errors occurring during SQL I/O will now throw a generic :class:`.DatabaseError` instead of the raw Exception type from the underlying driver manager library (:issue:`60748`) - Implemented :meth:`Series.str.isascii` and :meth:`Series.str.isascii` (:issue:`59091`) - Improved deprecation message for offset aliases (:issue:`60820`) +- Many type aliases are now exposed in the new submodule :py:mod:`pandas.api.typing.aliases` (:issue:`55231`) - Multiplying two :class:`DateOffset` objects will now raise a ``TypeError`` instead of a ``RecursionError`` (:issue:`59442`) - Restore support for reading Stata 104-format and enable reading 103-format dta files (:issue:`58554`) - Support passing a :class:`Iterable[Hashable]` input to :meth:`DataFrame.drop_duplicates` (:issue:`59237`)