|
8 | 8 | "# GroupBy for NestedPandas\n", |
9 | 9 | "\n", |
10 | 10 | "This notebook explores how Pandas' built-in `groupby` interacts with `NestedPandas` structures.\n", |
11 | | - "<!-- highlight what works, what doesn’t, and why — with clear examples and explanations. -->\n", |
12 | 11 | "\n", |
13 | 12 | "Because Nested-Pandas extends the Pandas library, native ``pandas.DataFrame.groupby`` works with nested-pandas out of the box in some ways. " |
14 | 13 | ] |
|
59 | 58 | "\n", |
60 | 59 | "- Some built-in methods like `count` work but not as expected (view nested column as a single object).\n", |
61 | 60 | "- Others (`min`, `max`, `mean`) fail on nested columns.\n", |
62 | | - "- Interestingly, `describe` will work as expcted with the automatic flattened nested column." |
| 61 | + "- Interestingly, `describe` will work as expected with the automatic flattened nested column." |
63 | 62 | ] |
64 | 63 | }, |
65 | 64 | { |
|
69 | 68 | "metadata": {}, |
70 | 69 | "outputs": [], |
71 | 70 | "source": [ |
72 | | - "# count is viewing nested columns as signle objects\n", |
| 71 | + "# count is viewing nested columns as single objects\n", |
73 | 72 | "nf.groupby(\"c\").count()" |
74 | 73 | ] |
75 | 74 | }, |
|
81 | 80 | "outputs": [], |
82 | 81 | "source": [ |
83 | 82 | "# min/max/mean fail on nested columns\n", |
84 | | - "nf.groupby(\"c\").min() # will produce error" |
| 83 | + "try:\n", |
| 84 | + " grouped_min = nf.groupby(\"c\").min()\n", |
| 85 | + " print(grouped_min)\n", |
| 86 | + "except TypeError as e:\n", |
| 87 | + " print(f\"Cannot compute min on nested columns: {e}\")" |
85 | 88 | ] |
86 | 89 | }, |
87 | 90 | { |
|
101 | 104 | "metadata": {}, |
102 | 105 | "source": [ |
103 | 106 | "## Type Preservation\n", |
104 | | - "Within each group, the object remains accessible as ``NestedFrame`` object and the nested columns remain ``NestedSeries``.\n", |
| 107 | + "Within each group, the object remains accessible as a ``NestedFrame`` object and the nested columns remain ``NestedSeries``.\n", |
105 | 108 | "\n", |
106 | 109 | "We can check this by applying a custom function on our 2-group `groupby` object:" |
107 | 110 | ] |
|
208 | 211 | "\n", |
209 | 212 | "`.apply()` for nested operations is supported natively. It generally works if the function flattens or use index slicing to ensure matching type for operations. \n", |
210 | 213 | "\n", |
211 | | - "Some potential exmaples:" |
| 214 | + "Some potential examples:" |
212 | 215 | ] |
213 | 216 | }, |
214 | 217 | { |
|
255 | 258 | "- Use **slice-based indexing** (.iloc[0:1]) to preserve nested types.\n", |
256 | 259 | "- Use **.nest.to_flat()** to flatten a nested column when needed for numerical or aggregating operations.\n", |
257 | 260 | "\n", |
258 | | - "- Nested structures are designed to reduce the need for expensive groupby operations by allowing data to stay organized hierarchically. However, when grouping is necessary, pandas’ groupby still works with nested-pandas and maintains type consistency." |
| 261 | + "- Nested structures are designed to reduce the need for expensive groupby operations by allowing data to stay organized hierarchically. However, when grouping is necessary, pandas’ groupby still works with nested-pandas and maintains type consistency.\n", |
| 262 | + "\n", |
| 263 | + "- Some use cases may behave unexpectedly because of the nested structures. We encourage users to open issues if you run into unexpected behavior or edge cases.\n" |
259 | 264 | ] |
260 | 265 | } |
261 | 266 | ], |
262 | 267 | "metadata": { |
263 | 268 | "kernelspec": { |
264 | | - "display_name": "Python 3 (ipykernel)", |
| 269 | + "display_name": ".venv", |
265 | 270 | "language": "python", |
266 | 271 | "name": "python3" |
267 | 272 | }, |
|
0 commit comments