-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using Int64 type for features in BetaGeoModel causes expected_purchases() to fail with TypeError #1471
Comments
@ColtAllen , I wonder if this is something you've seen. |
Hey @stochastic1, float64 types are required for the Also, training data is saved as a model attribute, so unless you need to run predictions on out-of-sample customers, you can run |
Thanks for the explanation. I'll use clv.rfm_summary() in my next pass. I did try executing with frequency as a whole number (int64) but recency and frequency as float64 type and the same TypeError emerged regarding PandasExtensionArray objects. Thanks for the tip on training data as an attribute for BetaGeoModel.expected_purchases(). My use case is actually to collapse the probabilistic estimates into point estimates at various steps to compare forecasts to observed results in a holdout set. Executing wide-open the matrix is enormous and I run out of memory. Are there point-estimate functions built in for the model outputs? Otherwise I'd expect to use methods from xarray. |
For daily raw data spanning many years, summarizing to weekly or monthly can help with model convergence, but ultimately it depends on your specific use case. If this model is to be used in a monthly business report for example, monthly predictions might make more sense. Summarizing to weekly would also make sense if your data has strong seasonality trends for days of the week.
I used the term "whole number" as an ambiguous case (be it 1.0 or 1). All variables require Float64 datatypes regardless.
Be mindful of any seasonal/holiday events that may bias results in the train/test periods. I'm planning to add high/low seasonality support, but not until Q4 this year: |
@ColtAllen , thank you for the guidance. I am working through these and will follow up by 2/12. |
Environment: jupyter notebook on GCP VertexAI instance, 16 vCPUs, 104GB RAM
Python version: 3.10.15
pymc-marketing version: <module 'pymc._version' from '/opt/conda/lib/python3.10/site-packages/pymc/_version.py'>
pandas version: 2.2.3
numpy version: 1.26.4
Expectation Passing integer types for frequency, T and recency, as in the example of daily activity and daily expectation steps into the BetaGeo model will enable successful execution of the expected_purchases() method.
Observed result Passing integer types for frequency, T and recency to the BetoGeo model causes execution of the expected_purchases() method to fail with a TypeError: 'PandasExtensionArray' object is not callable. However, passing float64 types for these three features enables the expected_purchases() method to succeed.
Hypothesis The _extract_predictive_variables() method on the expected_purchases() function in the BetaGeoModel() Class will only return a result if float values are passed for T, frequency and recency. Otherwise, passing any of these as Int64 types forces a conversion of that feature to an xarray.core.extension_array that causes the function execution to fail.
Context: I read in a dataframe with frequency, T and recency as Int64 values, intending to execute the BetaGeo model for daily observations.
The BetaGeo model converged on 100k unique customers in about 20 minutes. After model fit, I created a sample dataframe of 5 customers following the tutorial. Note the frequency, recency and T columns are Int64 type.
I called the expected_purchases() method, again following tutorial, for these five customers and received the error:
TypeError: 'PandasExtensionArray' object is not callable
StackTrace below, but I traced it back to the _extract_predictive_variables method treating Int64 features differently from float64 features:
When data_small has Int64
_extract_predictive_variables converts them to xarray type
When data_small is float64, _extract_predictive_variables persists them as float64:
StackTrace below:

The text was updated successfully, but these errors were encountered: