Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Uniformize bigint construction #3375

Draft
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

gwhitney
Copy link
Collaborator

@gwhitney gwhitney commented Jan 31, 2025

Adds options to the bigint constructor function math.bigint() to
control whether/how non-integer inputs are rounded or errors thrown,
and whether "unsafe" values outside the range where integers can be
uniquely represented are converted. Changes math.numeric(x, 'bigint')
to call the bigint constructor configured to round and to allow unsafe
values.

Also restores documentation generation for constructor functions.
and initiates History sections in function doc pages

Resolves #3366.
Resolves #3368.
Resolves #3341.

   Adds options to the bigint constructor function `math.bigint()` to
   control whether/how non-integer inputs are rounded or errors thrown,
   and whether "unsafe" values outside the range where integers can be
   uniquely represented are converted. Changes `math.numeric(x, 'bigint')`
   to call the bigint constructor configured to allow unsafe values but
   throw on non-integers (closest approximation to its previous buggy
   behavior).

   Also restores documentation generation for constructor functions,
   and initiates History sections in function doc pages.

   Resolves josdejong#3366.
   Resolves josdejong#3368.
   Resolves josdejong#3341.
@gwhitney gwhitney marked this pull request as draft February 17, 2025 20:18
@gwhitney
Copy link
Collaborator Author

gwhitney commented Feb 17, 2025

OK, I have marked this as draft for two reasons:

  1. In light of Inconsistencies in definitions of bigint operations and Fraction operations #3374, there is no need to for the behavior of math.bigint to depend on the predictable configuration, since it always returns a bigint (or throws) in any case. However, it would be most in line with the philosophy adopted in where we ended up on Inconsistencies in definitions of bigint operations and Fraction operations #3374 to not have the round conversion option, but always to throw if the input is not an integer; if you want to make a bigint from a non-integer, call math.round (or floor or ceil or fix) yourself. However, not having the round configuration would make this a breaking change, as the current behavior of math.bigint is to round, whereas of math.numeric(x, 'bigint') is to throw. I think either way we should have the safe conversion option, since without it there is currently no way for the conversion to check whether the conversion might be creating spurious accuracy, which is something mathjs generally tries to do. In fact, if we are going to go for a breaking change anyway, I would suggest making {safe: true} the default conversion option, as more in the spirit of mathjs in general. Please let me know whether you would like to keep the round conversion option to make this a non-breaking change, or for me to eliminate the round conversion option to align better with Inconsistencies in definitions of bigint operations and Fraction operations #3374 but make this a breaking change, and if the latter, what default value you want for the safe option.
  2. I need to add support for Complex arguments, since it certainly might be the case that a Complex number could represent an integer (if its imaginary part is zero). But I am waiting on your decision on the previous point to implement this, because if we don't have round options then this can throw whenever the imaginary part is different from 0 to current epsilon configuration, whereas if we do have round options it needs to have logic to see if the imaginary part rounds to 0 under the supplied round option.

Looking forward to your call here, then I can complete this PR to be ready for review.

@josdejong
Copy link
Owner

Thanks for putting together this draft PR.

I think in general each factory function (number, complex, bignumber, fraction, bigint, ...) is a way to explicitely change the type of a numeric value. You can use that to convert strings, or when the auto conversion of mathjs does complain (like with bignumber(2) + 2/3). I think the bigint factory function should not throw when the input is a non-integer and should always "work" except like when passing a string with invalid contents or an unknown input data type. So I think we should not implement an option safe.

Right now, the rounding behavior of bigint is basically using function round. Making the rounding configurable makes sense I think. How about the option "round": "round" | "floor" | "ceil" | "fix"? What do you think should be the default? Thinking about it, most programming languages use "fix", maybe that is a better default?

About the History description in the comments: I think it will be hard to maintain this by hand. Can we automate this? I think we can expand the script that extracts the documentation from the code comments to also collect all relevant git commits and add that to the generated web page of each function.

The improvements in the examples are nice. I'm not sure though why you have added indentation to some of the inline comments // ...?

@gwhitney
Copy link
Collaborator Author

So I think we should not implement an option safe.

What facilities should we then provide to support a use that needs a bignumber but needs to avoid the creation of illusory precision where there is none? That is exactly our use case, and mathjs is generally supportive of avoiding spurious precision. So I am fine to change this particular signature, but we need a safety mechanism, and then I need to put it in this PR, since it was created and filed precisely to allow us to adopt v14 of mathjs, and our adoption is waiting on such a mechanism. Thanks for your advice/design here.

@gwhitney
Copy link
Collaborator Author

I am fine of course with keeping the rounding config. The current behavior is round, any other default will be breaking. If we are going to break, floor would definitely be the mathematicians' choice.

@gwhitney
Copy link
Collaborator Author

gwhitney commented Feb 20, 2025

I think History in the source file is much more maintainable than central history: you must edit the function's source file to make the change anyway, so you add a line to the History. Once every function has such a section, I think it will become very natural, and the reviewer can easily check it was done just by glancing at the diffs. However, if I need to augment this with an autoparser to get it merged, let me know and I will do it -- but I would recommend inserting the results in the source file in the History section so you can see them there and possibly tweak them; otherwise I think they are not going to come out very readable.

@gwhitney
Copy link
Collaborator Author

The improvements in the examples are nice. I'm not sure though why you have added indentation to some of the inline comments // ...?

It is because and only because the doc test ignores indented comments, and so a cheap way to fix a doc test that is failing because of free form comments that confuse its attempts to grab the expected return value of an expression is to just indent the comment. This behavior came from a number of pre-existing doc sections that used indented comments as sort of section headers. So the "ignoring indented comments" exception was already there, and I was using it to get more tests to pass. The goal of course is to get to all doc tests passing, at which point they can finally switch to being enforced.

@josdejong
Copy link
Owner

  1. About safe conversion: interesting idea. We can create a separate function like numberIfSafe(value), or extend the existing function number(value, { safe: boolean }), so you can explicitely choose to "blindly" convert bigint to number, or only do so if it doesn't lead to loss of information. What do you think?

  2. About history: I really have to think through the various options that we have. Can we discuss this in a separate issue/PR please? I think this has less priority than improving bigint support, at at least I would like to get that finished asap.

  3. About the example indentation: would it be possible to solve this in a more neat by having the docgenerator ignore example lines that only contain a comment?

@gwhitney
Copy link
Collaborator Author

gwhitney commented Mar 20, 2025

  1. About safe conversion: interesting idea. We can create a separate function like numberIfSafe(value), or extend the existing function number(value, { safe: boolean }), so you can explicitely choose to "blindly" convert bigint to number, or only do so if it doesn't lead to loss of information. What do you think?

I personally would be happier with options on the conversion functions than new functions -- there are a lot of functions, which gets a bit unwieldy. This PR currently makes a {safe: boolean} option on the bigint(something) function, which you asked me to remove. Would it actually be OK to keep it as is?

Note there is a need for some safety mechanism for number => bigint as well as for bigint => number, because if someone tries to convert a number bigger than 2^54 to a bigint, it creates something that purports to be an exact large integer, but that number most likely did not really represent an exact integer, it likely had roundoff error. Un unchecked conversion of this type creates a false sense of more precision than there really was. It is exactly these kinds of number => bigint conversion that are messing up our modular arithmetic calculations, so we need a mechanism for stopping them.

  1. About history: I really have to think through the various options that we have. Can we discuss this in a separate issue/PR please? I think this has less priority than improving bigint support, at at least I would like to get that finished asap.

Yes, I get it. Happy to discuss in the history issue #3341. Does that mean you would like me to remove the History sections I added and the code that supported them from this PR, or should I leave them in as not doing any harm, and to keep around the information I generated, and they can always be massaged to the decided-on scheme later? Fine either way, just want to know what to do to get this PR in shape in the most direct way possible.

  1. About the example indentation: would it be possible to solve this in a more neat by having the docgenerator ignore example lines that only contain a comment?

I will give it a try and let you know.

…throws"

  Plus knock-on changes. This "simple" change unsurprisingly altered what
  doc tests occurred, leading to shaking out a bug in doc testing,
  which in turn shook out a bug in logical `and` (!).
  All should be well now.

  Also prevent `numeric(blah, 'bigint')` from throwing when it
  needs to round.
@gwhitney
Copy link
Collaborator Author

gwhitney commented Mar 21, 2025

5. About the example indentation: would it be possible to solve this in a more neat by having the docgenerator ignore example lines that only contain a comment?

I will give it a try and let you know.

OK, got through this. Unsurprisingly (with the free form nature of doc comments) it was a bit more convoluted than
expected, but now the convention is that any line with only a comment is ignored, unless that comment starts with 'returns' or 'throws'. This change actually shook out a few bugs, including one in logical "and" (and([1,1], 0) was returning [0, 0] rather than [false, false] as documented). All should be well now, and I the only things left I know of before this PR is ready for review are:

  • Decision on safe conversion options and the api for them
  • Should I remove the History work done so far?
  • Need to implement bigint(complex(3, 0)) etc, now that rounding options are set.

I will work on the last one ASAP and look forward to your feedback on the first two.

@josdejong
Copy link
Owner

(!) Ok yes let's go for a solution with {safe: boolean} option then, so keep it as it is in the PR. I think the default value should be {safe: false} when explictly converting a numeric type. Sorry for the back and forth. I see you implemented this for the bigint() function. I think we have indeed more similar cases for with a {safe: boolean} option is interesting, like you say, not only bigint => number but also number => BigNumber for example. What do you think?

(2) Thanks. Can you please remove the History sections from this PR? (maybe you can create a new temporary feature branch containing the current History sections for later reuse, so we don't throw away your work)

(5) Nice that you managed to get the issue with example lines with comments solved 👍

@gwhitney
Copy link
Collaborator Author

Sure, I can add other safe options. Bigint => number safety is clear, as is BigNumber => number. But number => bignumber is much murkier. Nominally, every number can be exactly represented as a bignumber. But I think the "safety" you have in mind is not wanting 1.20000000001 to become a bignumber because it will make what seems to be roundoff error seem precise. I think we have discussed this before -- there is an algorithm for guessing whether a number comes from a precise rational number, based on continued fractions. But we didn't seem to reach consensus on this before. So my preference would be not to address that type of safety in this pr lest it becomes bogged down.

@gwhitney
Copy link
Collaborator Author

gwhitney commented Mar 21, 2025

OK I have moved the History work to a branch feat/function_history in the main repo.

@josdejong
Copy link
Owner

Thanks!

So my preference would be not to address that type of safety in this pr lest it becomes bogged down.

Yes agree, let's adress those ideas in an other PR 👍

@gwhitney
Copy link
Collaborator Author

So my preference would be not to address that type of safety in this pr lest it becomes bogged down.

Yes agree, let's adress those ideas in an other PR 👍

The relevant discussion is in #1485.

@gwhitney
Copy link
Collaborator Author

Slightly confused: am implementing the {safe: true} option as uniformly as I can across different types, and I have gotten to the string => number case. The current code throws various errors if the string is not of the proper format or has too many significant digits or etc. But since this is an explicit request for a conversion to number, shouldn't the default string conversion just call parseFloat (i.e. do the best it can to return a number)? And should the existing logic actually be the logic when {safe: true} is specified? Please advise, thanks. Will hold off committing until I hear back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants