Merge pull request #339 from lincc-frameworks/nestedseries_docs

dougbrn · web-flow · commit e21f35776e51 · 2025-08-26T13:46:16.000-07:00
`NestedSeries` documentation
diff --git a/docs/gettingstarted/quickstart.ipynb b/docs/gettingstarted/quickstart.ipynb
@@ -33,9 +33,9 @@
    "source": [
     "## Overview\n",
     "\n",
-    "Nested-Pandas is tailored towards efficient analysis of nested data sets. This includes data that would normally be represented in a Pandas DataFrames with multiple rows needed to represent a single \"thing\" and therefor columns whose values will be identical for that item.\n",
+    "Nested-pandas is tailored towards efficient analysis of nested data sets. This includes data that would normally be represented in a Pandas DataFrames with multiple rows needed to represent a single \"thing\" and therefor columns whose values will be identical for that item.\n",
     "\n",
-    "As a concrete example, consider an astronomical data set storing information about observations of physical objects, such as stars and galaxies. One way to represent this in Pandas is to create one row per observation with an ID column indicating to which physical object the observation corresponds. However this approach ends up repeating a lot of data over each observation of the same object such as its location on the sky (RA, dec), its classification, etc. Further, any operations processing the data as time series requires the user to first perform a (potentially expensive) group-by operation to aggregate all of the data for each object.\n",
+    "As a concrete example, consider an astronomical data set storing information about observations of physical objects, such as stars and galaxies. One way to represent this in pandas is to create one row per observation with an ID column indicating to which physical object the observation corresponds. However this approach ends up repeating a lot of data over each observation of the same object such as its location on the sky (RA, dec), its classification, etc. Further, any operations processing the data as time series requires the user to first perform a (potentially expensive) group-by operation to aggregate all of the data for each object.\n",
     "\n",
     "Let's create a flat pandas dataframe with three objects: object 0 has three observations, object 1 has three observations, and object 2 has 4 observations."
    ]
@@ -56,6 +56,7 @@
     "        \"dec\": [0.0, 0.0, 0.0, -1.0, -1.0, -1.0, 0.5, 0.5, 0.5, 0.5],\n",
     "        \"time\": [60676.0, 60677.0, 60678.0, 60675.0, 60676.5, 60677.0, 60676.6, 60676.7, 60676.8, 60676.9],\n",
     "        \"brightness\": [100.0, 101.0, 99.8, 5.0, 5.01, 4.98, 20.1, 20.5, 20.3, 20.2],\n",
+    "        \"band\": [\"g\", \"r\", \"g\", \"r\", \"g\", \"r\", \"g\", \"g\", \"r\", \"r\"],\n",
     "    }\n",
     ")\n",
     "my_data_frame"
@@ -86,7 +87,7 @@
     "nf = NestedFrame.from_flat(\n",
     "    my_data_frame,\n",
     "    base_columns=[\"ra\", \"dec\"],  # the columns not to nest\n",
-    "    nested_columns=[\"time\", \"brightness\"],  # the columns to nest\n",
+    "    nested_columns=[\"time\", \"brightness\", \"band\"],  # the columns to nest\n",
     "    on=\"id\",  # column used to associate rows\n",
     "    name=\"lightcurve\",  # name of the nested column\n",
     ")\n",
@@ -239,7 +240,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The above query is native Pandas, however with nested-pandas we can use hierarchical column names to extend `query` to nested layers."
+    "The above query is native pandas, however with nested-pandas we can use hierarchical column names to extend `query` to nested layers."
    ]
   },
   {
@@ -283,7 +284,7 @@
    "source": [
     "## Reduce Function\n",
     "\n",
-    "Finally, we'll end with the flexible `reduce` function. `reduce` functions similarly to Pandas' `apply` but flattens (reduces) the inputs from nested layers into array inputs to the given apply function. For example, let's find the mean flux for each dataframe in \"nested\":"
+    "Finally, we'll end with the flexible `reduce` function. `reduce` functions similarly to pandas' `apply` but flattens (reduces) the inputs from nested layers into array inputs to the given apply function. For example, let's find the mean flux for each dataframe in \"nested\":"
    ]
   },
   {
@@ -341,11 +342,91 @@
    "source": [
     "nf_inputs.loc[0]"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Extended Series Operations with `NestedSeries`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In addition to the extended API offered by the `NestedFrame` for Dataframe operations, nested-pandas provides the `NestedSeries` extending Series operations for nested data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Single columns containing Nested Data are represented as NestedSeries\n",
+    "type(nf[\"lightcurve\"])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# It behaves just like a pandas Series\n",
+    "nf[\"lightcurve\"]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "`NestedSeries` offers some unique access patterns for getting data:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Accessing sub-columns\n",
+    "nf[\"lightcurve\"][\"time\"]  # Alternative to nf[\"lightcurve.time\"]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Multi-selecting sub-columns\n",
+    "nf[\"lightcurve\"][[\"time\", \"brightness\"]]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### `NestedSeries` Masking"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Using masks to filter nested data\n",
+    "g_mask = nf[\"lightcurve\"][\"band\"] == \"g\"\n",
+    "nf[\"lightcurve\"] = nf[\"lightcurve\"][g_mask]\n",
+    "nf"
+   ]
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "lsdb",
    "language": "python",
    "name": "python3"
   },
@@ -359,7 +440,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.13.3"
+   "version": "3.12.8"
   }
  },
  "nbformat": 4,
diff --git a/docs/reference.rst b/docs/reference.rst
@@ -5,6 +5,7 @@ API Reference
     :maxdepth: 2
 
     NestedFrame <reference/nestedframe>
+    NestedSeries <reference/nestedseries>
     .nest Accessor <reference/accessor>
     Utility Functions <reference/utils>
     NestedDtype <reference/nesteddtype>
diff --git a/docs/reference/nestedseries.rst b/docs/reference/nestedseries.rst
@@ -0,0 +1,19 @@
+=========
+NestedSeries
+=========
+.. currentmodule:: nested_pandas
+
+Constructor
+~~~~~~~~~~~
+.. autosummary::
+   :toctree: api/
+
+   NestedSeries
+
+Functions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autosummary::
+    :toctree: api/
+
+    NestedSeries.to_lists
+    NestedSeries.to_flat
diff --git a/docs/tutorials/data_manipulation.ipynb b/docs/tutorials/data_manipulation.ipynb
@@ -243,7 +243,7 @@
    "outputs": [],
    "source": [
     "# Create a flat dataframe from our existing nested dataframe\n",
-    "flat_df = ndf[\"nested\"].nest.to_flat()\n",
+    "flat_df = ndf[\"nested\"].to_flat()\n",
     "\n",
     "# Nest our flat dataframe back into our original dataframe\n",
     "ndf[\"example\"] = flat_df\n",
diff --git a/docs/tutorials/low_level.ipynb b/docs/tutorials/low_level.ipynb
@@ -67,13 +67,15 @@
    "id": "767e8105fcafca0d",
    "metadata": {},
    "source": [
-    "## Get access to different data views using `.nest` accessor\n",
+    "## Get access to different data views using the `.nest` accessor\n",
     "\n",
     "`pandas` provides an interface to access series with custom \"accessors\" - special attributes acting like a different view on the data.\n",
     "You may already know [`.str` accessor](https://pandas.pydata.org/pandas-docs/stable/reference/series.html#api-series-str) for strings or [`.dt` for datetime-like](https://pandas.pydata.org/pandas-docs/stable/reference/series.html#timedelta-methods) data.\n",
     "Since v2.0, pandas also supports few accessors for `ArrowDtype`d Series, `.list` for list-arrays and `.struct` for struct-arrays.\n",
     "\n",
-    "`nested-pandas` extends this concept and provides `.nest` accessor for `NestedDtype`d Series, which gives user an object to work with nested data more efficiently and flexibly."
+    "`nested-pandas` extends this concept and provides the `.nest` accessor for `NestedDtype`d Series, which gives user an object to work with nested data more efficiently and flexibly.\n",
+    "\n",
+    "> **Note**: The `.nest` accessor shares much of it's API with the `NestedSeries` API, as `NestedSeries` uses the `.nest` accessor under the hood. As a result, many `.nest` operations can be used directly, without invoking the \"`.nest`\" when working with a `NestedSeries`, but some lower-level functionalities remain unique to the `.nest` accessor."
    ]
   },
   {
diff --git a/src/nested_pandas/series/nestedseries.py b/src/nested_pandas/series/nestedseries.py
@@ -1,11 +1,16 @@
+from functools import wraps
+
 import pandas as pd
 
 from nested_pandas.series.dtype import NestedDtype
 
+__all__ = ["NestedSeries"]
+
 
 def nested_only(func):
     """Decorator to designate certain functions can only be used with NestedDtype."""
 
+    @wraps(func)  # This ensures the original function's metadata is preserved
     def wrapper(*args, **kwargs):
         if not isinstance(args[0].dtype, NestedDtype):
             raise TypeError(f"'{func.__name__}' can only be used with a NestedDtype, not '{args[0].dtype}'.")
@@ -79,11 +84,67 @@ def __setitem__(self, key, value):
         return super().__setitem__(key, value)
 
     @nested_only
-    def to_flat(self):
-        """Convert to a flat dataframe representation of the nested series."""
-        return self.nest.to_flat()
+    def to_flat(self, fields: list[str] | None = None) -> pd.DataFrame:
+        """Convert nested series into dataframe of flat arrays.
+
+        Parameters
+        ----------
+        fields : list[str] or None, optional
+            Names of the fields to include. Default is None, which means all fields.
+
+        Returns
+        -------
+        pd.DataFrame
+            Dataframe of flat arrays.
+
+        Examples
+        --------
+
+        >>> from nested_pandas.datasets.generation import generate_data
+        >>> nf = generate_data(5, 2, seed=1)
+
+        >>> nf["nested"].to_flat()
+                   t       flux band
+        0    8.38389  80.074457    r
+        0   13.40935  89.460666    g
+        1   13.70439  96.826158    g
+        1   8.346096   8.504421    g
+        2   4.089045  31.342418    g
+        2  11.173797   3.905478    g
+        3  17.562349  69.232262    r
+        3   2.807739  16.983042    r
+        4   0.547752  87.638915    g
+        4    3.96203   87.81425    r
+
+        """
+        return self.nest.to_flat(fields=fields)
 
     @nested_only
-    def to_lists(self):
-        """Convert to a list representation of the nested series."""
-        return self.nest.to_lists()
+    def to_lists(self, fields: list[str] | None = None) -> pd.DataFrame:
+        """Convert nested series into dataframe of list-array columns.
+
+        Parameters
+        ----------
+        fields : list[str] or None, optional
+            Names of the fields to include. Default is None, which means all fields.
+
+        Returns
+        -------
+        pd.DataFrame
+            Dataframe of list-arrays.
+
+        Examples
+        --------
+
+        >>> from nested_pandas.datasets.generation import generate_data
+        >>> nf = generate_data(5, 2, seed=1)
+
+        >>> nf["nested"].to_lists()
+                                   t                       flux       band
+        0  [ 8.38389029 13.4093502 ]  [80.07445687 89.46066635]  ['r' 'g']
+        1  [13.70439001  8.34609605]  [96.82615757  8.50442114]  ['g' 'g']
+        2  [ 4.08904499 11.17379657]  [31.34241782  3.90547832]  ['g' 'g']
+        3  [17.56234873  2.80773877]  [69.23226157 16.98304196]  ['r' 'r']
+        4    [0.54775186 3.96202978]  [87.63891523 87.81425034]  ['g' 'r']
+        """
+        return self.nest.to_lists(fields=fields)