Skip to content

Commit dc06874

Browse files
committed
docs: clarify automatic registration of pandas and pyarrow objects in SessionContext
1 parent 15b5cec commit dc06874

File tree

1 file changed

+14
-4
lines changed

1 file changed

+14
-4
lines changed

docs/source/user-guide/dataframe/index.rst

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -233,20 +233,30 @@ Core Classes
233233
such as ``ctx.sql("SELECT * FROM pdf")`` will register a pandas or
234234
PyArrow object named ``pdf`` without calling
235235
:py:meth:`~datafusion.SessionContext.from_pandas` or
236-
:py:meth:`~datafusion.SessionContext.from_arrow` explicitly. This requires
237-
the corresponding library (``pandas`` for pandas objects, ``pyarrow`` for
238-
Arrow objects) to be installed.
236+
:py:meth:`~datafusion.SessionContext.from_arrow` explicitly. This uses
237+
the Arrow PyCapsule Interface, so the corresponding library (``pandas``
238+
for pandas objects, ``pyarrow`` for Arrow objects) must be installed.
239239

240240
.. code-block:: python
241241
242242
import pandas as pd
243+
import pyarrow as pa
243244
from datafusion import SessionContext
244245
245246
ctx = SessionContext(auto_register_python_objects=True)
247+
248+
# pandas dataframe - requires pandas to be installed
246249
pdf = pd.DataFrame({"value": [1, 2, 3]})
250+
251+
# or pyarrow object - requires pyarrow to be installed
252+
arrow_table = pa.table({"value": [1, 2, 3]})
247253
254+
# If automatic registration is enabled, then we can query these objects directly
248255
df = ctx.sql("SELECT SUM(value) AS total FROM pdf")
249-
print(df.to_pandas()) # automatically registers `pdf`
256+
# or
257+
df = ctx.sql("SELECT SUM(value) AS total FROM arrow_table")
258+
259+
# without calling ctx.from_pandas() or ctx.from_arrow() explicitly
250260
251261
Automatic lookup is disabled by default. Enable it by passing
252262
``auto_register_python_objects=True`` when constructing the session or by

0 commit comments

Comments
 (0)