Skip to content

Data Explorer: Add preliminary customizable float formatting options for data values, summary stats #3310

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 30, 2024

Conversation

wesm
Copy link
Contributor

@wesm wesm commented May 29, 2024

Partly addresses #2339 and #3210, and #3314. Adds a format_options parameter in the get_data_values and get_column_profiles backend methods to allow customized formatting of large, small, and medium-sized numbers. If the value threshold is above or below limits implied by the parameters, scientific notation is used. We've talked about dynamically trimming zero padding from the end of numbers (e.g. if all the floats displayed are integral, then we can trim the trailing zeros off all of them), but this will be a frontend-only change that can be done after.

e.g. with

df2 = pd.DataFrame(
    {
        "a": [
            0,
            1.0,
            1.01,
            1.012,
            0.0123,
            0.01234,
            0.0001,
            0.00000001,
            9999.123,
            9999.999,
            9999999,
            10000000,
            -10000000,
        ]
    }
)

image

@jthomasmock
Copy link
Contributor

I think this is a good first step! Do you see a path forward for decimal alignment?

For example here is what RStudio does, still not necessarily ideal, but aligns everything:
image

@wesm
Copy link
Contributor Author

wesm commented May 30, 2024

Here's with slightly different settings in the front end:

image

@jthomasmock
Copy link
Contributor

And just for comparison, here's a nice display in Sheets:
image

Ideally, I think we want to be careful to not over display sigfigs on decimal when there are very large numbers

@wesm
Copy link
Contributor Author

wesm commented May 30, 2024

Ideally, I think we want to be careful to not over display sigfigs on decimal when there are very large numbers

Right. I think there are two things to do as next steps:

  • Decide on the default number of digits to use for formatting out of the box. If we want to make this "smart" on a per column basis, we'll have to add a formatting "sniffer" method that can make a guess based on a sample of the data (e.g. the first 1000 rows), but that's extra development work

  • Trim zeros dynamically on the front end (I'll create an issue)

I'll go ahead and merge this since nothing here is set in stone.

@wesm wesm changed the title Data Explorer: Add customizable float formatting options for data values, summary stats Data Explorer: Add preliminary customizable float formatting options for data values, summary stats May 30, 2024
@wesm wesm merged commit 3660fa8 into main May 30, 2024
21 checks passed
@wesm wesm deleted the feature/de-number-formatting branch May 30, 2024 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants