Skip to content

Commit 0c2c529

Browse files
authored
Merge branch 'main' into fix/#61636
2 parents 2b0d4ea + 7817cb2 commit 0c2c529

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

57 files changed

+531
-529
lines changed

.github/workflows/unit-tests.yml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -140,9 +140,6 @@ jobs:
140140

141141
moto:
142142
image: motoserver/moto:5.0.27
143-
env:
144-
AWS_ACCESS_KEY_ID: foobar_key
145-
AWS_SECRET_ACCESS_KEY: foobar_secret
146143
ports:
147144
- 5000:5000
148145

AUTHORS.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,12 @@ About the Copyright Holders
77
led by Wes McKinney. AQR released the source under this license in 2009.
88
* Copyright (c) 2011-2012, Lambda Foundry, Inc.
99

10-
Wes is now an employee of Lambda Foundry, and remains the pandas project
10+
Wes became an employee of Lambda Foundry, and remained the pandas project
1111
lead.
1212
* Copyright (c) 2011-2012, PyData Development Team
1313

1414
The PyData Development Team is the collection of developers of the PyData
15-
project. This includes all of the PyData sub-projects, including pandas. The
15+
project. This includes all of the PyData sub-projects, such as pandas. The
1616
core team that coordinates development on GitHub can be found here:
1717
https://github.com/pydata.
1818

@@ -23,11 +23,11 @@ Our Copyright Policy
2323

2424
PyData uses a shared copyright model. Each contributor maintains copyright
2525
over their contributions to PyData. However, it is important to note that
26-
these contributions are typically only changes to the repositories. Thus,
26+
these contributions are typically limited to changes to the repositories. Thus,
2727
the PyData source code, in its entirety, is not the copyright of any single
2828
person or institution. Instead, it is the collective copyright of the
2929
entire PyData Development Team. If individual contributors want to maintain
30-
a record of what changes/contributions they have specific copyright on,
30+
a record of the specific changes or contributions they hold copyright to,
3131
they should indicate their copyright in the commit message of the change
3232
when they commit the change to one of the PyData repositories.
3333

@@ -50,7 +50,7 @@ Other licenses can be found in the LICENSES directory.
5050
License
5151
=======
5252

53-
pandas is distributed under a 3-clause ("Simplified" or "New") BSD
53+
pandas is distributed under the 3-clause ("Simplified" or "New") BSD
5454
license. Parts of NumPy, SciPy, numpydoc, bottleneck, which all have
55-
BSD-compatible licenses, are included. Their licenses follow the pandas
55+
BSD-compatible licenses, are included. Their licenses are compatible with the pandas
5656
license.

ci/deps/actions-311-downstream_compat.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,8 @@ dependencies:
5050
- pytz>=2023.4
5151
- pyxlsb>=1.0.10
5252
- s3fs>=2023.12.2
53-
- scipy>=1.12.0
53+
# TEMP upper pin for scipy (https://github.com/statsmodels/statsmodels/issues/9584)
54+
- scipy>=1.12.0,<1.16
5455
- sqlalchemy>=2.0.0
5556
- tabulate>=0.9.0
5657
- xarray>=2024.1.1

doc/source/reference/indexing.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,7 @@ Conversion
9898
:toctree: api/
9999

100100
Index.astype
101+
Index.infer_objects
101102
Index.item
102103
Index.map
103104
Index.ravel

doc/source/whatsnew/v2.3.0.rst

Lines changed: 0 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -31,39 +31,6 @@ Other enhancements
3131
- The :meth:`~Series.cumsum`, :meth:`~Series.cummin`, and :meth:`~Series.cummax` reductions are now implemented for :class:`StringDtype` columns (:issue:`60633`)
3232
- The :meth:`~Series.sum` reduction is now implemented for :class:`StringDtype` columns (:issue:`59853`)
3333

34-
.. ---------------------------------------------------------------------------
35-
.. _whatsnew_230.notable_bug_fixes:
36-
37-
Notable bug fixes
38-
~~~~~~~~~~~~~~~~~
39-
40-
These are bug fixes that might have notable behavior changes.
41-
42-
.. _whatsnew_230.notable_bug_fixes.string_comparisons:
43-
44-
Comparisons between different string dtypes
45-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
46-
47-
In previous versions, comparing :class:`Series` of different string dtypes (e.g. ``pd.StringDtype("pyarrow", na_value=pd.NA)`` against ``pd.StringDtype("python", na_value=np.nan)``) would result in inconsistent resulting dtype or incorrectly raise. pandas will now use the hierarchy
48-
49-
object < (python, NaN) < (pyarrow, NaN) < (python, NA) < (pyarrow, NA)
50-
51-
in determining the result dtype when there are different string dtypes compared. Some examples:
52-
53-
- When ``pd.StringDtype("pyarrow", na_value=pd.NA)`` is compared against any other string dtype, the result will always be ``boolean[pyarrow]``.
54-
- When ``pd.StringDtype("python", na_value=pd.NA)`` is compared against ``pd.StringDtype("pyarrow", na_value=np.nan)``, the result will be ``boolean``, the NumPy-backed nullable extension array.
55-
- When ``pd.StringDtype("python", na_value=pd.NA)`` is compared against ``pd.StringDtype("python", na_value=np.nan)``, the result will be ``boolean``, the NumPy-backed nullable extension array.
56-
57-
.. _whatsnew_230.api_changes:
58-
59-
API changes
60-
~~~~~~~~~~~
61-
62-
- When enabling the ``future.infer_string`` option, :class:`Index` set operations (like
63-
union or intersection) will now ignore the dtype of an empty :class:`RangeIndex` or
64-
empty :class:`Index` with ``object`` dtype when determining the dtype of the resulting
65-
Index (:issue:`60797`)
66-
6734
.. ---------------------------------------------------------------------------
6835
.. _whatsnew_230.deprecations:
6936

@@ -85,8 +52,6 @@ Numeric
8552

8653
Strings
8754
^^^^^^^
88-
- Bug in :meth:`.DataFrameGroupBy.min`, :meth:`.DataFrameGroupBy.max`, :meth:`.Resampler.min`, :meth:`.Resampler.max` where all NA values of string dtype would return float instead of string dtype (:issue:`60810`)
89-
- Bug in :meth:`DataFrame.sum` with ``axis=1``, :meth:`.DataFrameGroupBy.sum` or :meth:`.SeriesGroupBy.sum` with ``skipna=True``, and :meth:`.Resampler.sum` with all NA values of :class:`StringDtype` resulted in ``0`` instead of the empty string ``""`` (:issue:`60229`)
9055
- Bug in :meth:`Series.__pos__` and :meth:`DataFrame.__pos__` where an ``Exception`` was not raised for :class:`StringDtype` with ``storage="pyarrow"`` (:issue:`60710`)
9156
- Bug in :meth:`Series.rank` for :class:`StringDtype` with ``storage="pyarrow"`` that incorrectly returned integer results with ``method="average"`` and raised an error if it would truncate results (:issue:`59768`)
9257
- Bug in :meth:`Series.replace` with :class:`StringDtype` when replacing with a non-string value was not upcasting to ``object`` dtype (:issue:`60282`)

doc/source/whatsnew/v2.3.1.rst

Lines changed: 51 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,57 @@ including other versions of pandas.
99
{{ header }}
1010

1111
.. ---------------------------------------------------------------------------
12-
.. _whatsnew_231.enhancements:
12+
.. _whatsnew_231.string_fixes:
13+
14+
Improvements and fixes for the StringDtype
15+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
16+
17+
.. _whatsnew_231.string_fixes.string_comparisons:
18+
19+
Comparisons between different string dtypes
20+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21+
22+
In previous versions, comparing :class:`Series` of different string dtypes (e.g. ``pd.StringDtype("pyarrow", na_value=pd.NA)`` against ``pd.StringDtype("python", na_value=np.nan)``) would result in inconsistent resulting dtype or incorrectly raise. pandas will now use the hierarchy
23+
24+
object < (python, NaN) < (pyarrow, NaN) < (python, NA) < (pyarrow, NA)
25+
26+
in determining the result dtype when there are different string dtypes compared. Some examples:
27+
28+
- When ``pd.StringDtype("pyarrow", na_value=pd.NA)`` is compared against any other string dtype, the result will always be ``boolean[pyarrow]``.
29+
- When ``pd.StringDtype("python", na_value=pd.NA)`` is compared against ``pd.StringDtype("pyarrow", na_value=np.nan)``, the result will be ``boolean``, the NumPy-backed nullable extension array.
30+
- When ``pd.StringDtype("python", na_value=pd.NA)`` is compared against ``pd.StringDtype("python", na_value=np.nan)``, the result will be ``boolean``, the NumPy-backed nullable extension array.
31+
32+
.. _whatsnew_231.string_fixes.ignore_empty:
33+
34+
Index set operations ignore empty RangeIndex and object dtype Index
35+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
36+
37+
When enabling the ``future.infer_string`` option, :class:`Index` set operations (like
38+
union or intersection) will now ignore the dtype of an empty :class:`RangeIndex` or
39+
empty :class:`Index` with ``object`` dtype when determining the dtype of the resulting
40+
Index (:issue:`60797`).
41+
42+
This ensures that combining such empty Index with strings will infer the string dtype
43+
correctly, rather than defaulting to ``object`` dtype. For example:
44+
45+
.. code-block:: python
46+
47+
>>> pd.options.mode.infer_string = True
48+
>>> df = pd.DataFrame()
49+
>>> df.columns.dtype
50+
dtype('int64') # default RangeIndex for empty columns
51+
>>> df["a"] = [1, 2, 3]
52+
>>> df.columns.dtype
53+
<StringDtype(na_value=nan)> # new columns use string dtype instead of object dtype
54+
55+
.. _whatsnew_231.string_fixes.bugs:
56+
57+
Bug fixes
58+
^^^^^^^^^
59+
- Bug in :meth:`.DataFrameGroupBy.min`, :meth:`.DataFrameGroupBy.max`, :meth:`.Resampler.min`, :meth:`.Resampler.max` where all NA values of string dtype would return float instead of string dtype (:issue:`60810`)
60+
- Bug in :meth:`DataFrame.sum` with ``axis=1``, :meth:`.DataFrameGroupBy.sum` or :meth:`.SeriesGroupBy.sum` with ``skipna=True``, and :meth:`.Resampler.sum` with all NA values of :class:`StringDtype` resulted in ``0`` instead of the empty string ``""`` (:issue:`60229`)
61+
- Fixed bug in :meth:`DataFrame.explode` and :meth:`Series.explode` where methods would fail with ``dtype="str"`` (:issue:`61623`)
1362

14-
Enhancements
15-
~~~~~~~~~~~~
16-
-
1763

1864
.. _whatsnew_231.regressions:
1965

@@ -26,7 +72,7 @@ Fixed regressions
2672

2773
Bug fixes
2874
~~~~~~~~~
29-
- Fixed bug in :meth:`DataFrame.explode` and :meth:`Series.explode` where methods would fail with ``dtype="str"`` (:issue:`61623`)
75+
-
3076

3177
.. ---------------------------------------------------------------------------
3278
.. _whatsnew_231.other:

doc/source/whatsnew/v3.0.0.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,9 @@ Enhancement2
2828

2929
Other enhancements
3030
^^^^^^^^^^^^^^^^^^
31+
- :func:`pandas.merge` propagates the ``attrs`` attribute to the result if all
32+
inputs have identical ``attrs``, as has so far already been the case for
33+
:func:`pandas.concat`.
3134
- :class:`pandas.api.typing.FrozenList` is available for typing the outputs of :attr:`MultiIndex.names`, :attr:`MultiIndex.codes` and :attr:`MultiIndex.levels` (:issue:`58237`)
3235
- :class:`pandas.api.typing.SASReader` is available for typing the output of :func:`read_sas` (:issue:`55689`)
3336
- Added :meth:`.Styler.to_typst` to write Styler objects to file, buffer or string in Typst format (:issue:`57617`)
@@ -745,9 +748,11 @@ Indexing
745748
- Bug in :meth:`DataFrame.__getitem__` returning modified columns when called with ``slice`` in Python 3.12 (:issue:`57500`)
746749
- Bug in :meth:`DataFrame.__getitem__` when slicing a :class:`DataFrame` with many rows raised an ``OverflowError`` (:issue:`59531`)
747750
- Bug in :meth:`DataFrame.from_records` throwing a ``ValueError`` when passed an empty list in ``index`` (:issue:`58594`)
751+
- Bug in :meth:`DataFrame.loc` and :meth:`DataFrame.iloc` returning incorrect dtype when selecting from a :class:`DataFrame` with mixed data types. (:issue:`60600`)
748752
- Bug in :meth:`DataFrame.loc` with inconsistent behavior of loc-set with 2 given indexes to Series (:issue:`59933`)
749753
- Bug in :meth:`Index.get_indexer` and similar methods when ``NaN`` is located at or after position 128 (:issue:`58924`)
750754
- Bug in :meth:`MultiIndex.insert` when a new value inserted to a datetime-like level gets cast to ``NaT`` and fails indexing (:issue:`60388`)
755+
- Bug in :meth:`Series.__setitem__` when assigning boolean series with boolean indexer will raise ``LossySetitemError`` (:issue:`57338`)
751756
- Bug in printing :attr:`Index.names` and :attr:`MultiIndex.levels` would not escape single quotes (:issue:`60190`)
752757
- Bug in reindexing of :class:`DataFrame` with :class:`PeriodDtype` columns in case of consolidated block (:issue:`60980`, :issue:`60273`)
753758

@@ -777,6 +782,7 @@ I/O
777782
- Bug in :meth:`DataFrame.to_excel` when writing empty :class:`DataFrame` with :class:`MultiIndex` on both axes (:issue:`57696`)
778783
- Bug in :meth:`DataFrame.to_excel` where the :class:`MultiIndex` index with a period level was not a date (:issue:`60099`)
779784
- Bug in :meth:`DataFrame.to_stata` when exporting a column containing both long strings (Stata strL) and :class:`pd.NA` values (:issue:`23633`)
785+
- Bug in :meth:`DataFrame.to_stata` when input encoded length and normal length are mismatched (:issue:`61583`)
780786
- Bug in :meth:`DataFrame.to_stata` when writing :class:`DataFrame` and ``byteorder=`big```. (:issue:`58969`)
781787
- Bug in :meth:`DataFrame.to_stata` when writing more than 32,000 value labels. (:issue:`60107`)
782788
- Bug in :meth:`DataFrame.to_string` that raised ``StopIteration`` with nested DataFrames. (:issue:`16098`)

environment.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,9 +64,8 @@ dependencies:
6464
- dask-core
6565
- seaborn-base
6666

67-
# local testing dependencies
67+
# Mocking s3 tests
6868
- moto
69-
- flask
7069

7170
# benchmarks
7271
- asv>=0.6.1

pandas/_libs/src/datetime/pd_datetime.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -192,6 +192,10 @@ static npy_datetime PyDateTimeToEpoch(PyObject *dt, NPY_DATETIMEUNIT base) {
192192
return npy_dt;
193193
}
194194

195+
/* Initializes and exposes a customer datetime C-API from the pandas library
196+
* by creating a PyCapsule that stores function pointers, which can be accessed
197+
* later by other C code or Cython code that imports the capsule.
198+
*/
195199
static int pandas_datetime_exec(PyObject *Py_UNUSED(module)) {
196200
PyDateTime_IMPORT;
197201
PandasDateTime_CAPI *capi = PyMem_Malloc(sizeof(PandasDateTime_CAPI));

pandas/compat/_optional.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -152,8 +152,8 @@ def import_optional_dependency(
152152
install_name = package_name if package_name is not None else name
153153

154154
msg = (
155-
f"Missing optional dependency '{install_name}'. {extra} "
156-
f"Use pip or conda to install {install_name}."
155+
f"`Import {install_name}` failed. {extra} "
156+
f"Use pip or conda to install the {install_name} package."
157157
)
158158
try:
159159
module = importlib.import_module(name)

pandas/conftest.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2116,3 +2116,9 @@ def temp_file(tmp_path):
21162116
file_path = tmp_path / str(uuid.uuid4())
21172117
file_path.touch()
21182118
return file_path
2119+
2120+
2121+
@pytest.fixture(scope="session")
2122+
def monkeysession():
2123+
with pytest.MonkeyPatch.context() as mp:
2124+
yield mp

pandas/core/apply.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
TYPE_CHECKING,
1111
Any,
1212
Literal,
13+
TypeAlias,
1314
cast,
1415
)
1516

@@ -71,7 +72,7 @@
7172
from pandas.core.resample import Resampler
7273
from pandas.core.window.rolling import BaseWindow
7374

74-
ResType = dict[int, Any]
75+
ResType: TypeAlias = dict[int, Any]
7576

7677

7778
class BaseExecutionEngine(abc.ABC):

pandas/core/arrays/datetimelike.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
TYPE_CHECKING,
1111
Any,
1212
Literal,
13+
TypeAlias,
1314
Union,
1415
cast,
1516
final,
@@ -161,7 +162,7 @@
161162
TimedeltaArray,
162163
)
163164

164-
DTScalarOrNaT = Union[DatetimeLikeScalar, NaTType]
165+
DTScalarOrNaT: TypeAlias = DatetimeLikeScalar | NaTType
165166

166167

167168
def _make_unpacked_invalid_op(op_name: str):
@@ -386,7 +387,7 @@ def __getitem__(self, key: PositionalIndexer2D) -> Self | DTScalarOrNaT:
386387
# Use cast as we know we will get back a DatetimeLikeArray or DTScalar,
387388
# but skip evaluating the Union at runtime for performance
388389
# (see https://github.com/pandas-dev/pandas/pull/44624)
389-
result = cast("Union[Self, DTScalarOrNaT]", super().__getitem__(key))
390+
result = cast(Union[Self, DTScalarOrNaT], super().__getitem__(key))
390391
if lib.is_scalar(result):
391392
return result
392393
else:

pandas/core/arrays/interval.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
from typing import (
1010
TYPE_CHECKING,
1111
Literal,
12-
Union,
12+
TypeAlias,
1313
overload,
1414
)
1515

@@ -109,8 +109,8 @@
109109
)
110110

111111

112-
IntervalSide = Union[TimeArrayLike, np.ndarray]
113-
IntervalOrNA = Union[Interval, float]
112+
IntervalSide: TypeAlias = TimeArrayLike | np.ndarray
113+
IntervalOrNA: TypeAlias = Interval | float
114114

115115
_interval_shared_docs: dict[str, str] = {}
116116

pandas/core/arrays/string_arrow.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@
44
import re
55
from typing import (
66
TYPE_CHECKING,
7-
Union,
87
)
98
import warnings
109

@@ -64,9 +63,6 @@
6463
from pandas import Series
6564

6665

67-
ArrowStringScalarOrNAT = Union[str, libmissing.NAType]
68-
69-
7066
def _chk_pyarrow_available() -> None:
7167
if pa_version_under10p1:
7268
msg = "pyarrow>=10.0.1 is required for PyArrow backed ArrowExtensionArray."

pandas/core/dtypes/cast.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1926,6 +1926,10 @@ def np_can_hold_element(dtype: np.dtype, element: Any) -> Any:
19261926
# i.e. there are pd.NA elements
19271927
raise LossySetitemError
19281928
return element
1929+
# GH 57338 check boolean array set as object type
1930+
if tipo.kind == "O" and isinstance(element, np.ndarray):
1931+
if lib.is_bool_array(element):
1932+
return element.astype("bool")
19291933
raise LossySetitemError
19301934
if lib.is_bool(element):
19311935
return element

pandas/core/generic.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -330,8 +330,8 @@ def attrs(self) -> dict[Hashable, Any]:
330330
-----
331331
Many operations that create new datasets will copy ``attrs``. Copies
332332
are always deep so that changing ``attrs`` will only affect the
333-
present dataset. ``pandas.concat`` copies ``attrs`` only if all input
334-
datasets have the same ``attrs``.
333+
present dataset. :func:`pandas.concat` and :func:`pandas.merge` will
334+
only copy ``attrs`` if all input datasets have the same ``attrs``.
335335
336336
Examples
337337
--------
@@ -6090,11 +6090,11 @@ def __finalize__(self, other, method: str | None = None, **kwargs) -> Self:
60906090
assert isinstance(name, str)
60916091
object.__setattr__(self, name, getattr(other, name, None))
60926092

6093-
if method == "concat":
6094-
objs = other.objs
6095-
# propagate attrs only if all concat arguments have the same attrs
6093+
elif hasattr(other, "input_objs"):
6094+
objs = other.input_objs
6095+
# propagate attrs only if all inputs have the same attrs
60966096
if all(bool(obj.attrs) for obj in objs):
6097-
# all concatenate arguments have non-empty attrs
6097+
# all inputs have non-empty attrs
60986098
attrs = objs[0].attrs
60996099
have_same_attrs = all(obj.attrs == attrs for obj in objs[1:])
61006100
if have_same_attrs:

pandas/core/groupby/generic.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,8 @@
1717
Any,
1818
Literal,
1919
NamedTuple,
20+
TypeAlias,
2021
TypeVar,
21-
Union,
2222
cast,
2323
)
2424
import warnings
@@ -102,7 +102,7 @@
102102
from pandas.core.generic import NDFrame
103103

104104
# TODO(typing) the return value on this callable should be any *scalar*.
105-
AggScalar = Union[str, Callable[..., Any]]
105+
AggScalar: TypeAlias = str | Callable[..., Any]
106106
# TODO: validate types on ScalarResult and move to _typing
107107
# Blocked from using by https://github.com/python/mypy/issues/1484
108108
# See note at _mangle_lambda_list

0 commit comments

Comments
 (0)