Skip to content

Commit a0692be

Browse files
Tom Augspurgerjreback
authored andcommitted
API: Add return_type kwarg to boxplot
update docs API: Let 'by' and groupby follow return_type Write all the docs
1 parent 48729e2 commit a0692be

File tree

7 files changed

+272
-33
lines changed

7 files changed

+272
-33
lines changed

doc/source/groupby.rst

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -909,6 +909,37 @@ To see the order in which each row appears within its group, use the
909909
910910
df.groupby('A').cumcount(ascending=False) # kwarg only
911911
912+
Plotting
913+
~~~~~~~~
914+
915+
Groupby also works with some plotting methods. For example, suppose we
916+
suspect that some features in a DataFrame my differ by group, in this case,
917+
the values in column 1 where the group is "B" are 3 higher on average.
918+
919+
.. ipython:: python
920+
921+
np.random.seed(1234)
922+
df = DataFrame(np.random.randn(50, 2))
923+
df['g'] = np.random.choice(['A', 'B'], size=50)
924+
df.loc[df['g'] == 'B', 1] += 3
925+
926+
We can easily visualize this with a boxplot:
927+
928+
..ipython:: python
929+
930+
@savefig groupby_boxplot.png
931+
bp = df.groupby('g').boxplot()
932+
933+
The result of calling ``boxplot`` is a dictionary whose keys are the values
934+
of our grouping column ``g`` ("A" and "B"). The values of the resulting dictionary
935+
can be controlled by the ``return_type`` keyword of ``boxplot``.
936+
See the :ref:`visualization documentation<visualization.box>` for more.
937+
938+
.. warning::
939+
940+
For historical reasons, ``df.groupby("g").boxplot()`` is not equivalent
941+
to ``df.boxplot(by="g")``. See :ref:`here<visualization.box.return>`.
942+
912943
Examples
913944
--------
914945

doc/source/release.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,9 @@ API Changes
208208
returns a different Index (:issue:`7088`). Previously the index was unintentionally sorted.
209209
- arithmetic operations with **only** ``bool`` dtypes now raise an error
210210
(:issue:`7011`, :issue:`6762`, :issue:`7015`)
211+
- :meth:`DataFrame.boxplot` has a new keyword argument, `return_type`. It accepts ``'dict'``,
212+
``'axes'``, or ``'both'``, in which case a namedtuple with the matplotlib
213+
axes and a dict of matplotlib Lines is returned.
211214

212215
Deprecations
213216
~~~~~~~~~~~~
@@ -258,6 +261,10 @@ Deprecations
258261
Use the `percentiles` keyword instead, which takes a list of percentiles to display. The
259262
default output is unchanged.
260263

264+
- The default return type of :func:`boxplot` will change from a dict to a matpltolib Axes
265+
in a future release. You can use the future behavior now by passing ``return_type='dict'``
266+
to boxplot.
267+
261268
Prior Version Deprecations/Changes
262269
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
263270

doc/source/v0.14.0.txt

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -212,6 +212,10 @@ API changes
212212
# this now raises for arith ops like ``+``, ``*``, etc.
213213
NotImplementedError: operator '*' not implemented for bool dtypes
214214

215+
- :meth:`DataFrame.boxplot` has a new keyword argument, `return_type`. It accepts ``'dict'``,
216+
``'axes'``, or ``'both'``, in which case a namedtuple with the matplotlib
217+
axes and a dict of matplotlib Lines is returned.
218+
215219

216220
.. _whatsnew_0140.display:
217221

@@ -574,6 +578,10 @@ Deprecations
574578
Use the `percentiles` keyword instead, which takes a list of percentiles to display. The
575579
default output is unchanged.
576580

581+
- The default return type of :func:`boxplot` will change from a dict to a matpltolib Axes
582+
in a future release. You can use the future behavior now by passing ``return_type='dict'``
583+
to boxplot.
584+
577585
.. _whatsnew_0140.enhancements:
578586

579587
Enhancements

doc/source/visualization.rst

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -304,6 +304,42 @@ columns:
304304
305305
plt.close('all')
306306
307+
.. _visualization.box.return:
308+
309+
The return type of ``boxplot`` depends on two keyword arguments: ``by`` and ``return_type``.
310+
When ``by`` is ``None``:
311+
312+
* if ``return_type`` is ``'dict'``, a dictionary containing the :class:`matplotlib Lines <matplotlib.lines.Line2D>` is returned. The keys are "boxes", "caps", "fliers", "medians", and "whiskers".
313+
This is the deafult.
314+
* if ``return_type`` is ``'axes'``, a :class:`matplotlib Axes <matplotlib.axes.Axes>` containing the boxplot is returned.
315+
* if ``return_type`` is ``'both'`` a namedtuple containging the :class:`matplotlib Axes <matplotlib.axes.Axes>`
316+
and :class:`matplotlib Lines <matplotlib.lines.Line2D>` is returned
317+
318+
When ``by`` is some column of the DataFrame, a dict of ``return_type`` is returned, where
319+
the keys are the columns of the DataFrame. The plot has a facet for each column of
320+
the DataFrame, with a separate box for each value of ``by``.
321+
322+
Finally, when calling boxplot on a :class:`Groupby` object, a dict of ``return_type``
323+
is returned, where the keys are the same as the Groupby object. The plot has a
324+
facet for each key, with each facet containing a box for each column of the
325+
DataFrame.
326+
327+
.. ipython:: python
328+
329+
np.random.seed(1234)
330+
df_box = DataFrame(np.random.randn(50, 2))
331+
df_box['g'] = np.random.choice(['A', 'B'], size=50)
332+
df_box.loc[df_box['g'] == 'B', 1] += 3
333+
334+
..ipython:: python
335+
336+
@savefig(boxplot_groupby.png)
337+
df_box.boxplot(by='g')
338+
339+
@savefig groupby_boxplot_vis.png
340+
df_box.groupby('g').boxplot()
341+
342+
307343
.. _visualization.area_plot:
308344

309345
Area Plot

pandas/core/frame.py

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4857,7 +4857,8 @@ def _put_str(s, space):
48574857

48584858

48594859
def boxplot(self, column=None, by=None, ax=None, fontsize=None,
4860-
rot=0, grid=True, **kwds):
4860+
rot=0, grid=True, figsize=None, layout=None, return_type=None,
4861+
**kwds):
48614862
"""
48624863
Make a box plot from DataFrame column/columns optionally grouped
48634864
(stratified) by one or more columns
@@ -4875,17 +4876,32 @@ def boxplot(self, column=None, by=None, ax=None, fontsize=None,
48754876
Rotation for ticks
48764877
grid : boolean, default None (matlab style default)
48774878
Axis grid lines
4879+
layout : tuple (optional)
4880+
(rows, columns) for the layout of the plot
4881+
return_type : bool, default False
4882+
Whether to return a dict whose values are the lines of the boxplot
4883+
kwds : other plotting keyword arguments to be passed to matplotlib boxplot
4884+
function
48784885
48794886
Returns
48804887
-------
48814888
ax : matplotlib.axes.AxesSubplot
4889+
lines : dict (optional)
4890+
4891+
Notes
4892+
-----
4893+
Use ``return_dict=True`` when you want to modify the appearance
4894+
of the lines. In this case a named tuple is returned.
48824895
"""
48834896
import pandas.tools.plotting as plots
48844897
import matplotlib.pyplot as plt
48854898
ax = plots.boxplot(self, column=column, by=by, ax=ax,
4886-
fontsize=fontsize, grid=grid, rot=rot, **kwds)
4899+
fontsize=fontsize, grid=grid, rot=rot,
4900+
figsize=figsize, layout=layout, return_dict=return_dict,
4901+
**kwds)
48874902
plt.draw_if_interactive()
48884903
return ax
4904+
48894905
DataFrame.boxplot = boxplot
48904906

48914907
ops.add_flex_arithmetic_methods(DataFrame, **ops.frame_flex_funcs)

pandas/tests/test_graphics.py

Lines changed: 101 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1309,14 +1309,14 @@ def test_boxplot(self):
13091309
df['indic'] = ['foo', 'bar'] * 3
13101310
df['indic2'] = ['foo', 'bar', 'foo'] * 2
13111311

1312-
_check_plot_works(df.boxplot)
1313-
_check_plot_works(df.boxplot, column=['one', 'two'])
1312+
_check_plot_works(df.boxplot, return_type='dict')
1313+
_check_plot_works(df.boxplot, column=['one', 'two'], return_type='dict')
13141314
_check_plot_works(df.boxplot, column=['one', 'two'], by='indic')
13151315
_check_plot_works(df.boxplot, column='one', by=['indic', 'indic2'])
13161316
_check_plot_works(df.boxplot, by='indic')
13171317
_check_plot_works(df.boxplot, by=['indic', 'indic2'])
1318-
_check_plot_works(plotting.boxplot, df['one'])
1319-
_check_plot_works(df.boxplot, notch=1)
1318+
_check_plot_works(plotting.boxplot, df['one'], return_type='dict')
1319+
_check_plot_works(df.boxplot, notch=1, return_type='dict')
13201320
_check_plot_works(df.boxplot, by='indic', notch=1)
13211321

13221322
df = DataFrame(np.random.rand(10, 2), columns=['Col1', 'Col2'])
@@ -1337,10 +1337,83 @@ def test_boxplot(self):
13371337

13381338
# When by is None, check that all relevant lines are present in the dict
13391339
fig, ax = self.plt.subplots()
1340-
d = df.boxplot(ax=ax)
1340+
d = df.boxplot(ax=ax, return_type='dict')
13411341
lines = list(itertools.chain.from_iterable(d.values()))
13421342
self.assertEqual(len(ax.get_lines()), len(lines))
13431343

1344+
@slow
1345+
def test_boxplot_return_type(self):
1346+
# API change in https://github.com/pydata/pandas/pull/7096
1347+
import matplotlib as mpl
1348+
1349+
df = DataFrame(randn(6, 4),
1350+
index=list(string.ascii_letters[:6]),
1351+
columns=['one', 'two', 'three', 'four'])
1352+
with tm.assertRaises(ValueError):
1353+
df.boxplot(return_type='NOTATYPE')
1354+
1355+
with tm.assert_produces_warning(FutureWarning):
1356+
result = df.boxplot()
1357+
self.assertIsInstance(result, dict) # change to Axes in future
1358+
1359+
with tm.assert_produces_warning(False):
1360+
result = df.boxplot(return_type='dict')
1361+
self.assertIsInstance(result, dict)
1362+
1363+
with tm.assert_produces_warning(False):
1364+
result = df.boxplot(return_type='axes')
1365+
self.assertIsInstance(result, mpl.axes.Axes)
1366+
1367+
with tm.assert_produces_warning(False):
1368+
result = df.boxplot(return_type='both')
1369+
self.assertIsInstance(result, tuple)
1370+
1371+
@slow
1372+
def test_boxplot_return_type_by(self):
1373+
import matplotlib as mpl
1374+
1375+
df = DataFrame(np.random.randn(10, 2))
1376+
df['g'] = ['a'] * 5 + ['b'] * 5
1377+
1378+
# old style: return_type=None
1379+
result = df.boxplot(by='g')
1380+
self.assertIsInstance(result, np.ndarray)
1381+
self.assertIsInstance(result[0], mpl.axes.Axes)
1382+
1383+
result = df.boxplot(by='g', return_type='dict')
1384+
self.assertIsInstance(result, dict)
1385+
self.assertIsInstance(result[0], dict)
1386+
1387+
result = df.boxplot(by='g', return_type='axes')
1388+
self.assertIsInstance(result, dict)
1389+
self.assertIsInstance(result[0], mpl.axes.Axes)
1390+
1391+
result = df.boxplot(by='g', return_type='both')
1392+
self.assertIsInstance(result, dict)
1393+
self.assertIsInstance(result[0], tuple)
1394+
self.assertIsInstance(result[0][0], mpl.axes.Axes)
1395+
self.assertIsInstance(result[0][1], dict)
1396+
1397+
# now for groupby
1398+
with tm.assert_produces_warning(FutureWarning):
1399+
result = df.groupby('g').boxplot()
1400+
self.assertIsInstance(result, dict)
1401+
self.assertIsInstance(result['a'], dict)
1402+
1403+
result = df.groupby('g').boxplot(return_type='dict')
1404+
self.assertIsInstance(result, dict)
1405+
self.assertIsInstance(result['a'], dict)
1406+
1407+
result = df.groupby('g').boxplot(return_type='axes')
1408+
self.assertIsInstance(result, dict)
1409+
self.assertIsInstance(result['a'], mpl.axes.Axes)
1410+
1411+
result = df.groupby('g').boxplot(return_type='both')
1412+
self.assertIsInstance(result, dict)
1413+
self.assertIsInstance(result['a'], tuple)
1414+
self.assertIsInstance(result['a'][0], mpl.axes.Axes)
1415+
self.assertIsInstance(result['a'][1], dict)
1416+
13441417
@slow
13451418
def test_kde(self):
13461419
_skip_if_no_scipy()
@@ -2044,31 +2117,32 @@ class TestDataFrameGroupByPlots(TestPlotBase):
20442117

20452118
@slow
20462119
def test_boxplot(self):
2047-
# unable to check layout because boxplot doesn't return ndarray
2048-
# axes_num can be checked using gcf().axes
20492120
grouped = self.hist_df.groupby(by='gender')
2050-
box = _check_plot_works(grouped.boxplot)
2121+
box = _check_plot_works(grouped.boxplot, return_type='dict')
20512122
self._check_axes_shape(self.plt.gcf().axes, axes_num=2)
20522123

2053-
box = _check_plot_works(grouped.boxplot, subplots=False)
2124+
box = _check_plot_works(grouped.boxplot, subplots=False,
2125+
return_type='dict')
20542126
self._check_axes_shape(self.plt.gcf().axes, axes_num=2)
20552127

20562128
tuples = lzip(string.ascii_letters[:10], range(10))
20572129
df = DataFrame(np.random.rand(10, 3),
20582130
index=MultiIndex.from_tuples(tuples))
20592131

20602132
grouped = df.groupby(level=1)
2061-
box = _check_plot_works(grouped.boxplot)
2133+
box = _check_plot_works(grouped.boxplot, return_type='dict')
20622134
self._check_axes_shape(self.plt.gcf().axes, axes_num=10)
20632135

2064-
box = _check_plot_works(grouped.boxplot, subplots=False)
2136+
box = _check_plot_works(grouped.boxplot, subplots=False,
2137+
return_type='dict')
20652138
self._check_axes_shape(self.plt.gcf().axes, axes_num=10)
20662139

20672140
grouped = df.unstack(level=1).groupby(level=0, axis=1)
2068-
box = _check_plot_works(grouped.boxplot)
2141+
box = _check_plot_works(grouped.boxplot, return_type='dict')
20692142
self._check_axes_shape(self.plt.gcf().axes, axes_num=3)
20702143

2071-
box = _check_plot_works(grouped.boxplot, subplots=False)
2144+
box = _check_plot_works(grouped.boxplot, subplots=False,
2145+
return_type='dict')
20722146
self._check_axes_shape(self.plt.gcf().axes, axes_num=3)
20732147

20742148
def test_series_plot_color_kwargs(self):
@@ -2133,31 +2207,38 @@ def test_grouped_box_layout(self):
21332207
self.assertRaises(ValueError, df.boxplot, column=['weight', 'height'],
21342208
by=df.gender, layout=(1, 1))
21352209
self.assertRaises(ValueError, df.boxplot, column=['height', 'weight', 'category'],
2136-
layout=(2, 1))
2210+
layout=(2, 1), return_type='dict')
21372211

2138-
box = _check_plot_works(df.groupby('gender').boxplot, column='height')
2212+
box = _check_plot_works(df.groupby('gender').boxplot, column='height',
2213+
return_type='dict')
21392214
self._check_axes_shape(self.plt.gcf().axes, axes_num=2)
21402215

2141-
box = _check_plot_works(df.groupby('category').boxplot, column='height')
2216+
box = _check_plot_works(df.groupby('category').boxplot, column='height',
2217+
return_type='dict')
21422218
self._check_axes_shape(self.plt.gcf().axes, axes_num=4)
21432219

21442220
# GH 6769
2145-
box = _check_plot_works(df.groupby('classroom').boxplot, column='height')
2221+
box = _check_plot_works(df.groupby('classroom').boxplot,
2222+
column='height', return_type='dict')
21462223
self._check_axes_shape(self.plt.gcf().axes, axes_num=3)
21472224

21482225
box = df.boxplot(column=['height', 'weight', 'category'], by='gender')
21492226
self._check_axes_shape(self.plt.gcf().axes, axes_num=3)
21502227

2151-
box = df.groupby('classroom').boxplot(column=['height', 'weight', 'category'])
2228+
box = df.groupby('classroom').boxplot(
2229+
column=['height', 'weight', 'category'], return_type='dict')
21522230
self._check_axes_shape(self.plt.gcf().axes, axes_num=3)
21532231

2154-
box = _check_plot_works(df.groupby('category').boxplot, column='height', layout=(3, 2))
2232+
box = _check_plot_works(df.groupby('category').boxplot, column='height',
2233+
layout=(3, 2), return_type='dict')
21552234
self._check_axes_shape(self.plt.gcf().axes, axes_num=4)
21562235

21572236
box = df.boxplot(column=['height', 'weight', 'category'], by='gender', layout=(4, 1))
21582237
self._check_axes_shape(self.plt.gcf().axes, axes_num=3)
21592238

2160-
box = df.groupby('classroom').boxplot(column=['height', 'weight', 'category'], layout=(1, 4))
2239+
box = df.groupby('classroom').boxplot(
2240+
column=['height', 'weight', 'category'], layout=(1, 4),
2241+
return_type='dict')
21612242
self._check_axes_shape(self.plt.gcf().axes, axes_num=3)
21622243

21632244
@slow

0 commit comments

Comments
 (0)