Skip to content

Commit 0faaf5c

Browse files
DOC: add section about upcoming pandas 3.0 changes (string dtype, CoW) to 2.3 whatsnew notes (#61795)
Co-authored-by: Simon Hawkins <[email protected]>
1 parent 2b471c8 commit 0faaf5c

File tree

2 files changed

+99
-1
lines changed

2 files changed

+99
-1
lines changed

doc/source/whatsnew/v2.3.0.rst

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,104 @@ including other versions of pandas.
1010

1111
.. ---------------------------------------------------------------------------
1212
13+
.. _whatsnew_230.upcoming_changes:
14+
15+
Upcoming changes in pandas 3.0
16+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
17+
18+
pandas 3.0 will bring two bigger changes to the default behavior of pandas.
19+
20+
Dedicated string data type by default
21+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
22+
23+
Historically, pandas represented string columns with NumPy ``object`` data type.
24+
This representation has numerous problems: it is not specific to strings (any
25+
Python object can be stored in an ``object``-dtype array, not just strings) and
26+
it is often not very efficient (both performance wise and for memory usage).
27+
28+
Starting with the upcoming pandas 3.0 release, a dedicated string data type will
29+
be enabled by default (backed by PyArrow under the hood, if installed, otherwise
30+
falling back to NumPy). This means that pandas will start inferring columns
31+
containing string data as the new ``str`` data type when creating pandas
32+
objects, such as in constructors or IO functions.
33+
34+
Old behavior:
35+
36+
.. code-block:: python
37+
38+
>>> ser = pd.Series(["a", "b"])
39+
0 a
40+
1 b
41+
dtype: object
42+
43+
New behavior:
44+
45+
.. code-block:: python
46+
47+
>>> ser = pd.Series(["a", "b"])
48+
0 a
49+
1 b
50+
dtype: str
51+
52+
The string data type that is used in these scenarios will mostly behave as NumPy
53+
object would, including missing value semantics and general operations on these
54+
columns.
55+
56+
However, the introduction of a new default dtype will also have some breaking
57+
consequences to your code (for example when checking for the ``.dtype`` being
58+
object dtype). To allow testing it in advance of the pandas 3.0 release, this
59+
future dtype inference logic can be enabled in pandas 2.3 with:
60+
61+
.. code-block:: python
62+
63+
pd.options.future.infer_string = True
64+
65+
See the :ref:`string_migration_guide` for more details on the behaviour changes
66+
and how to adapt your code to the new default.
67+
68+
Copy-on-Write
69+
^^^^^^^^^^^^^
70+
71+
The currently optional mode Copy-on-Write will be enabled by default in pandas 3.0. There
72+
won't be an option to retain the legacy behavior.
73+
74+
In summary, the new "copy-on-write" behaviour will bring changes in behavior in
75+
how pandas operates with respect to copies and views.
76+
77+
1. The result of *any* indexing operation (subsetting a DataFrame or Series in any way,
78+
i.e. including accessing a DataFrame column as a Series) or any method returning a
79+
new DataFrame or Series, always *behaves as if* it were a copy in terms of user
80+
API.
81+
2. As a consequence, if you want to modify an object (DataFrame or Series), the only way
82+
to do this is to directly modify that object itself.
83+
84+
Because every single indexing step now behaves as a copy, this also means that
85+
"chained assignment" (updating a DataFrame with multiple setitem steps) will
86+
stop working. Because this now consistently never works, the
87+
``SettingWithCopyWarning`` will be removed.
88+
89+
The new behavioral semantics are explained in more detail in the
90+
:ref:`user guide about Copy-on-Write <copy_on_write>`.
91+
92+
The new behavior can be enabled since pandas 2.0 with the following option:
93+
94+
.. code-block:: python
95+
96+
pd.options.mode.copy_on_write = True
97+
98+
Some of the behaviour changes allow a clear deprecation, like the changes in
99+
chained assignment. Other changes are more subtle and thus, the warnings are
100+
hidden behind an option that can be enabled since pandas 2.2:
101+
102+
.. code-block:: python
103+
104+
pd.options.mode.copy_on_write = "warn"
105+
106+
This mode will warn in many different scenarios that aren't actually relevant to
107+
most queries. We recommend exploring this mode, but it is not necessary to get rid
108+
of all of these warnings. The :ref:`migration guide <copy_on_write.migration_guide>`
109+
explains the upgrade process in more detail.
110+
13111
.. _whatsnew_230.enhancements:
14112

15113
Enhancements

doc/source/whatsnew/v2.3.1.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ correctly, rather than defaulting to ``object`` dtype. For example:
4444

4545
.. code-block:: python
4646
47-
>>> pd.options.mode.infer_string = True
47+
>>> pd.options.future.infer_string = True
4848
>>> df = pd.DataFrame()
4949
>>> df.columns.dtype
5050
dtype('int64') # default RangeIndex for empty columns

0 commit comments

Comments
 (0)