-
Notifications
You must be signed in to change notification settings - Fork 112
Add DataFrame usage guide with HTML rendering customization options #1108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is excellent! Thank you very much for it. My only comments are to fix the rst parsing.
docs/source/user-guide/dataframe.rst
Outdated
and Arrow. | ||
|
||
A DataFrame represents a logical plan that can be composed through operations like filtering, projection, and aggregation. | ||
The actual execution happens when terminal operations like `collect()` or `show()` are called. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These need double back ticks to render properly.
``collect()`` or ``show()``
docs/source/user-guide/dataframe.rst
Outdated
When working in Jupyter notebooks or other environments that support HTML rendering, DataFrames will | ||
automatically display as formatted HTML tables, making it easier to visualize your data. | ||
|
||
The `_repr_html_` method is called automatically by Jupyter to render a DataFrame. This method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double back ticks
docs/source/user-guide/dataframe.rst
Outdated
The actual execution happens when terminal operations like `collect()` or `show()` are called. | ||
|
||
Basic Usage | ||
---------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ---- needs to be the same length as the title above it. It's one - too short
docs/source/user-guide/dataframe.rst
Outdated
df.show() | ||
|
||
HTML Rendering | ||
------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs on extra -
docs/source/user-guide/dataframe.rst
Outdated
plain text output. | ||
|
||
Customizing HTML Rendering | ||
------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs one more -
docs/source/user-guide/dataframe.rst
Outdated
The formatter settings affect all DataFrames displayed after configuration. | ||
|
||
Custom Style Providers | ||
--------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs one more -
docs/source/user-guide/dataframe.rst
Outdated
configure_formatter(style_provider=MyStyleProvider()) | ||
|
||
Creating a Custom Formatter | ||
-------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs one more -
docs/source/user-guide/dataframe.rst
Outdated
custom_html = formatter.format_html(batches, schema) | ||
|
||
Managing Formatters | ||
------------------ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs one more -
docs/source/user-guide/dataframe.rst
Outdated
print(formatter.theme) | ||
|
||
Contextual Formatting | ||
-------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs one more -
Thank you @timsaucer for the detailed review. |
Thank you again! |
…pache#1108) * docs: enhance user guide with detailed DataFrame operations and examples * move /docs/source/api/dataframe.rst into user-guide * docs: remove DataFrame API documentation * docs: fix formatting inconsistencies in DataFrame user guide * Two minor corrections to documentation rendering --------- Co-authored-by: Tim Saucer <[email protected]>
…Memory and Display Controls (#1119) * feat: add configurable max table bytes and min table rows for DataFrame display * Revert "feat: add configurable max table bytes and min table rows for DataFrame display" This reverts commit f9b78fa. * feat: add FormatterConfig for configurable DataFrame display options * refactor: simplify attribute extraction in get_formatter_config function * refactor: remove hardcoded constants and use FormatterConfig for display options * refactor: simplify record batch collection by using FormatterConfig for display options * feat: add max_memory_bytes, min_rows_display, and repr_rows parameters to DataFrameHtmlFormatter * feat: add tests for HTML formatter row display settings and memory limit * refactor: extract Python formatter retrieval into a separate function * Revert "feat: add tests for HTML formatter row display settings and memory limit" This reverts commit e089d7b. * feat: add tests for HTML formatter row and memory limit configurations * Revert "feat: add tests for HTML formatter row and memory limit configurations" This reverts commit 4090fd2. * feat: add tests for new parameters and validation in DataFrameHtmlFormatter * Reorganize tests * refactor: rename and restructure formatter functions for clarity and maintainability * feat: implement PythonFormatter struct and refactor formatter retrieval for improved clarity * refactor: improve comments and restructure FormatterConfig usage in PyDataFrame * Add DataFrame usage guide with HTML rendering customization options (#1108) * docs: enhance user guide with detailed DataFrame operations and examples * move /docs/source/api/dataframe.rst into user-guide * docs: remove DataFrame API documentation * docs: fix formatting inconsistencies in DataFrame user guide * Two minor corrections to documentation rendering --------- Co-authored-by: Tim Saucer <[email protected]> * Update documentation * refactor: streamline HTML rendering documentation * refactor: extract validation logic into separate functions for clarity * Implement feature X to enhance user experience and optimize performance * feat: add validation method for FormatterConfig to ensure positive integer values * add comment - ensure minimum rows are collected even if memory or row limits are hit * Update html_formatter documentation * update tests * remove unused type hints from imports in html_formatter.py * remove redundant tests for DataFrameHtmlFormatter and clean up assertions * refactor get_attr function to support generic default values * build_formatter_config_from_python return PyResult * fix ruff errors * trigger ci * fix: remove redundant newline in test_custom_style_provider_html_formatter * add more tests * trigger ci * Fix ruff errors * fix clippy error * feat: add validation for parameters in configure_formatter * test: add tests for invalid parameters in configure_formatter * Fix ruff errors --------- Co-authored-by: Tim Saucer <[email protected]>
Which issue does this PR close?
Closes #1100
Rationale for this change
This change provides users with a dedicated and detailed guide for working with DataFrames in DataFusion. It introduces essential concepts, usage examples, and advanced features like HTML rendering customization, making it easier for both new and experienced users to take full advantage of the DataFrame API. This documentation enhancement will improve developer experience and usability.
What changes are included in this PR?
docs/source/user-guide/dataframe.rst
docs/source/user-guide/basics.rst
with a reference to the new DataFrame guideAre there any user-facing changes?
✅ Yes — this PR adds new user-facing documentation:
There are no breaking changes to the public API — only enhancements to documentation.