Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deprecation warning for np array #360

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

brk21
Copy link

@brk21 brk21 commented Aug 16, 2021

/databricks/python/lib/python3.8/site-packages/tslearn/barycenters/softdtw.py:103: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
X_ = numpy.array([to_time_series(d, remove_nans=True) for d in X_])

Because this method often uses numpy arrays with different shapes, I think the approach is justified here.

/databricks/python/lib/python3.8/site-packages/tslearn/barycenters/softdtw.py:103: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  X_ = numpy.array([to_time_series(d, remove_nans=True) for d in X_])

Because this method often uses numpy arrays with different shapes, I think the approach is justified here.
@codecov-commenter
Copy link

Codecov Report

Merging #360 (500a2fb) into main (42a56cc) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #360      +/-   ##
==========================================
+ Coverage   94.71%   94.74%   +0.02%     
==========================================
  Files          59       59              
  Lines        4525     4525              
==========================================
+ Hits         4286     4287       +1     
+ Misses        239      238       -1     
Impacted Files Coverage Δ
tslearn/barycenters/softdtw.py 100.00% <100.00%> (ø)
tslearn/clustering/kshape.py 98.29% <0.00%> (+0.85%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 42a56cc...500a2fb. Read the comment docs.

@rtavenar
Copy link
Member

Hi @brk21

Thanks for pointing out this deprecation warning! It seems to me that it is not even necessary to cast the list into a numpy array, or is it?

Couldn't we do something like:

X_ = [to_time_series(Xi, remove_nans=True) for Xi in X_]

?

@brk21
Copy link
Author

brk21 commented Aug 17, 2021

Hi @rtavenar,

I think you are right, but I also think we will generate the same warning in the to_time_series function here:

https://github.com/tslearn-team/tslearn/blob/main/tslearn/utils/utils.py#L146

So you may end up adding the object=true parameter there to avoid the deprecation.

Please let me know if I can support in any way.

Ross

@rtavenar
Copy link
Member

I would like to avoid the object=True since our algorithms in tslearn are supposed to operate on datasets represented as 3d arrays, which is no longer the case when arrays of objects are used.
Regarding the to_time_series function, it is definitely unexpected behaviour to provide a series that does not have the same number of features for all timestamps, so maybe we should throw a ValueError in this case, what do you think?

@brk21
Copy link
Author

brk21 commented Aug 17, 2021

@rtavenar I would be strongly opposed to that because one of the main values of softdtw is that it can operate on time series of different lengths! This ValueError would effectively eliminate our ability to make use of softdtw from tslearn.

If you wanted to port to_time_series into a separate version for time series of varying lengths and just use object=true in the contexts or algorithms that are designed to operate on time series of different lengths, then that might be a good middle ground that preserves integrity for the to_time_series function but makes allowance for the algorithms that do operate on time series of different lengths.

@rtavenar
Copy link
Member

@rtavenar I would be strongly opposed to that because one of the main values of softdtw is that it can operate on time series of different lengths! This ValueError would effectively eliminate our ability to make use of softdtw from tslearn.

I meant raising a ValueError for the case of the to_time_series function only, since this function only deals with one time series at a time, not a dataset of time series (for which to_time_series_dataset exists). Your workaround or the one I suggested earlier are fine for softDTW.

@brk21
Copy link
Author

brk21 commented Aug 17, 2021

Phew @rtavenar Apologies for the misunderstanding.

Yes, I would be fine with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants