Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Timestamp.normalize() can overflow silently #60583

Open
3 tasks done
wjandrea opened this issue Dec 16, 2024 · 6 comments
Open
3 tasks done

BUG: Timestamp.normalize() can overflow silently #60583

wjandrea opened this issue Dec 16, 2024 · 6 comments
Labels
Bug Timestamp pd.Timestamp and associated methods

Comments

@wjandrea
Copy link
Contributor

wjandrea commented Dec 16, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.
    • Query: is:issue normalize overflow
  • I have confirmed this bug exists on the latest version of pandas.
  • I have confirmed this bug exists on the main branch of pandas.

Issue description

I was trying to get the minimum date, so I tried pd.Timestamp.min.normalize() before realizing that would be out of range, but what's weird is that it overflows without warning:

>>> pd.Timestamp.min
Timestamp('1677-09-21 00:12:43.145224193')
>>> pd.Timestamp.min.normalize()
Timestamp('2262-04-11 23:34:33.709551616')

For reference:

>>> pd.Timestamp.max
Timestamp('2262-04-11 23:47:16.854775807')

Compare that to .floor('D'), which I thought would be equivalent, but it raises:

>>> pd.Timestamp.min.floor('D')
OverflowError: value too large


The above exception was the direct cause of the following exception:

OutOfBoundsDatetime: Cannot round 1677-09-21 00:12:43.145224193 to freq=<Day> without overflow

as well as .round('D') (same error).

Sidenote: What I really wanted was .ceil('D'), which gets the first whole day:

>>> pd.Timestamp.min.ceil('D')
Timestamp('1677-09-22 00:00:00')

Installed Versions

INSTALLED VERSIONS
------------------
commit                : d41884b2dd0823dc6288ab65d06650302e903c6b
python                : 3.12.8
python-bits           : 64
OS                    : Linux
OS-release            : 5.4.0-202-generic
Version               : #222-Ubuntu SMP Fri Nov 8 14:45:04 UTC 2024
machine               : x86_64
processor             : x86_64
byteorder             : little
LC_ALL                : None
LANG                  : fr_CA.UTF-8
LOCALE                : fr_CA.UTF-8

pandas                : 3.0.0.dev0+1777.gd41884b2dd
numpy                 : 2.1.0.dev0+git20240403.e59c074
dateutil              : 2.9.0.post0
pip                   : 24.3.1
Cython                : None
sphinx                : None
IPython               : 8.22.2
adbc-driver-postgresql: None
adbc-driver-sqlite    : None
bs4                   : None
blosc                 : None
bottleneck            : None
fastparquet           : None
fsspec                : None
html5lib              : None
hypothesis            : None
gcsfs                 : None
jinja2                : None
lxml.etree            : None
matplotlib            : None
numba                 : None
numexpr               : None
odfpy                 : None
openpyxl              : None
psycopg2              : None
pymysql               : None
pyarrow               : None
pyreadstat            : None
pytest                : None
python-calamine       : None
pytz                  : 2024.1
pyxlsb                : None
s3fs                  : None
scipy                 : None
sqlalchemy            : None
tables                : None
tabulate              : None
xarray                : None
xlrd                  : None
xlsxwriter            : None
zstandard             : None
tzdata                : 2024.1
qtpy                  : None
pyqt5                 : None
@wjandrea wjandrea added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 16, 2024
@wjandrea wjandrea changed the title Is this a bug? Timestamp.normalize() Is this a bug? Timestamp.normalize() can overflow silently (vs .floor() which raises) Dec 16, 2024
@shashankrushiya
Copy link

I don't think the above mention is a bug. The reason is because-
Here, when we use normalize() method, it sets the time part of the timestamp to 00:00:00. However, pd.Timestamp.min is already extremely close to the lower boundary of supported date times. And talking about floor() and round() methods, both are more strict in their handling. They basically check whether the operation would cause an overflow and raise an OverFlowError if it would.
If you consider ceil('D'), it will work as this method rounds up to the next day, which is still within the valid range for pd.Timestamp.

@wjandrea
Copy link
Contributor Author

wjandrea commented Dec 19, 2024

@shashankrushiya I'm not following. How does that mean it's not a bug?

talking about floor() and round() methods, both are more strict in their handling. They basically check whether the operation would cause an overflow and raise an OverFlowError if it would.

So then why doesn't .normalize() check? Is there a good reason or is it just an oversight? (i.e. a bug, or maybe not a bug per se but a potential enhancement)

BTW, note that pd.Timestamp.min is the lower boundary, not "extremely close" to the boundary.

@rhshadrach rhshadrach added Timestamp pd.Timestamp and associated methods and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 20, 2024
@rhshadrach rhshadrach changed the title Is this a bug? Timestamp.normalize() can overflow silently (vs .floor() which raises) BUG: Timestamp.normalize() can overflow silently Dec 20, 2024
@rhshadrach
Copy link
Member

Thanks for the report! Agreed normalize should raise an OverflowError here. PRs to fix are welcome!

@shashankrushiya Sorry, is this text AI-generated? It's not very helpful.

I view this as a violation of pandas' Code of conduct and ask you to refrain from making such statements in the future. We'd like to make pandas a welcome place for all contributors, and this type of comment works against this goal.

@wjandrea
Copy link
Contributor Author

@rhshadrach My bad. I've edited my above comment to be respectful and more constructive. Sorry @shashankrushiya!

@anishkarki
Copy link
Contributor

take

@johnasiano
Copy link

Proposing a change to fix this issue: #60624. My understanding is that the bug is due to the code logic described in timestamps.pyx.

The first option that I considered was modifying normalize in timestamps.pyx. The second option that I considered was modifying normalize_i8_stamp() in timestamps.pyx, and also updating normalize_i8_stamp() in timestamps.pxd to update the normalize_i8_stamp() declaration to match the new signature.

datetimes.py would not be modified because my understanding is that this overflow issue is not present in datetimes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timestamp pd.Timestamp and associated methods
Projects
None yet
Development

No branches or pull requests

5 participants