Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_fit data in NHPP/NHPPAgeReplacementPolicy #28

Open
21ch216 opened this issue Feb 18, 2025 · 4 comments
Open

to_fit data in NHPP/NHPPAgeReplacementPolicy #28

21ch216 opened this issue Feb 18, 2025 · 4 comments
Assignees
Labels
bug Something isn't working enhancement New feature or request v2.0.0/nhpp

Comments

@21ch216
Copy link
Collaborator

21ch216 commented Feb 18, 2025

only for branch v2.0.0/nhpp

Both NHPP and NHPPAgeReplacementPolicy can be used to sample data.

However, I think that NHPP should not be used to sample data, as it leads to two problems:

  • Passing an end_time to NHPP.sample is not straightforward and can quickly lead to errors if the end_time (the time horizon) is too large. It doesn’t even need to be very high for this issue to arise, as the generated durations can grow rapidly in number with very small values. Leaving this responsibility in NHPP.sample might lead to ambiguous behavior or errors that are hard to diagnose.
  • Additionally, using to_fit to convert generated data to fit-like arguments would be problematic because setting the t0 and tf observation window is difficult when data can be contained within a very small interval. I don’t want a new user learning about NHPP to be confused or disturbed by unintuitive warnings or behaviors specific to NHPP that would be hard to explain.

That's why I want to propose this solution:

  • Classes in the process module, which model stochastic processes, should not have the sample function as part of their interface. As a consequence, I would remove these methods from RenewalProcess and RenewalRewardProcess.
  • Only classes in the policies module should include a sample method.

This approach makes more sense to me, as it would help enforce the separation of concerns between stochastic processes and policies. It would also solve the issues described above, as the end_time could be set without any problem within NHPPAgeReplacementPolicy.

What do you think?

@21ch216 21ch216 added bug Something isn't working enhancement New feature or request v2.0.0/nhpp labels Feb 18, 2025
@21ch216
Copy link
Collaborator Author

21ch216 commented Feb 18, 2025

If we dedicate sample to policy objects only, generator.py module will be part of policies package. I think it will be cleaner

@21ch216
Copy link
Collaborator Author

21ch216 commented Feb 20, 2025

I want to highlight another issue with the current sample methods. The to_fit method, specifically with the t0 and tf arguments, behaves like a subsampling process for data generated by sample(end_time). I understand that the main motivation behind using to_fit with t0 and tf to clip the observation window is to introduce truncations and censorings. However, due to the particular nature of the sampling process (stacking 1D arrays until end_time), the to_fit code with t0 and tf is complex to explain, challenging to implement properly, and difficult to maintain when fully written in NumPy.

For NHPP, this issue becomes even more pronounced. At the moment, I don’t see a clear way to avoid writing an inefficient and inelegant for-loop to process these resulting 1D arrays generated by sample.

Thus, my suggestion is to rethink the relationship between sample and to_fit. Specifically, I propose that only the sample method should handle t0 and tf, while the to_fit method should expect no arguments. My intuition is that delegating the responsibility of clipping data at t0 and tf to the sample method (within its internal generator logic) would make the implementation simpler, easier to understand, and more maintainable compared to reprocessing and transforming the resulting 1D arrays after sampling. In this approach, the tf argument of sample would also naturally take the role of the original end_time.

Here’s how I envision this work: while the structure of the generators (yield arrays) in sample would remain unchanged, a new layer of data selection logic (based on the t0 and tf arguments) would be added within the for-loop of each sample methods. Additionally, the CountData object could include extra fields to store the corresponding truncations and/or censoring values

@21ch216
Copy link
Collaborator Author

21ch216 commented Feb 21, 2025

After several attempts, I realized that the problem might be more complex than it initially seemed. Here's why:
Imagine I want to create a policy that uses one underlying model but switches to a different model1 specifically for the first cycle. To achieve this, the following example might be written:

model = Weibull(...)
model1 = Gompertz(...)
policy = AgeReplacementPolicy(model, ..., model1=model1, ...)

Once defined, I can use the sample method to generate data and compute costs:

sample_data = policy.sample(nb_samples=10, end_time=...)
sample_data.mean_total_reward()

However, calling sample_data.to_fit(t0, tf) with something like:

time, event, entry = sample_data.to_fit(t0, tf)

makes no sense in this context. The issue stems from the ambiguity about which model the lifetimes were derived from. This creates confusion as it is unclear whether the lifetimes correspond to model or model1.
It leads to the fundamental idea:

We cannot expect a CountData object to have a to_fit method. This functionality should instead belong to the underlying model.

Proposal

A better and more intuitive approach can be structured as follows:

# Extract fit data directly from the base model
time, event, entry = model.get_fit_data(size=..., t0, tf)

# With age replacement (ar) censoring applied
time, event, entry = AgeReplacementModel(model).get_fit_data(ar, size=..., t0, tf)

# When `t0 != 0`, apply left truncations
time, event, entry = model.get_fit_data(size=..., t0, tf)  # Where t0 != 0

With this setup:

  • Policies (or stochastic process objects) will retain the sample method, but this should only be used to compute empirical measure—not expectations.

What About NHPP?

The same logic can be extended to non-homogeneous Poisson processes (NHPP). A get_fit_data method could be added for consistency. However, calling something like AgeReplacementModel(nhpp) does not make sense, as AgeReplacementModel is specifically intended for use with LifetimeModel.
Instead, a unified approach could be defined for creating left-truncated or age-replacement-modified versions of objects. These would be accessed via dedicated methods:

nhpp.replace_at_age(
    ...
)  # Loads a slightly modified NHPP to account for age replacement.
nhpp.get_fit_data(size, t0, tf)

This strategy preserves clarity, aligns method functionalities with object responsibilities, and avoids confusion. This solution also ties into issue #27 and offers a unified, extensible way to handle left truncations and age-replacement modifications across different object types.

@21ch216
Copy link
Collaborator Author

21ch216 commented Feb 24, 2025

duplicated in #32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request v2.0.0/nhpp
Projects
None yet
Development

No branches or pull requests

3 participants