Time-series forecasting for yet-to-arrive arrival rates (notes on investigation)

## Context

The current `IncomingAdmissionPredictor` classes in `patientflow` decompose the yet-to-arrive prediction into two parts:

1. **Arrival rates** (λ_t): how many patients will arrive in each time interval?
2. **Admission probability** (θ_t): given arrival, what is the probability of admission within the prediction window?

The arrival rates are currently computed as simple historical means per time-of-day interval, averaged across the training period via `time_varying_arrival_rates`. The admission probability is handled by aspirational parametric curves (in production) or empirical survival curves.

This issue concerns **part 1 only** — whether we can improve the arrival rate estimates using time-series forecasting, while keeping the aspirational approach for admission probability unchanged.

## Reference implementation

The [ED-Forecast-Short](https://github.com/Countess-of-Chester-Hospital-NHS-FT/ED-Forecast-Short) repository from Countess of Chester Hospital NHS FT, shared by @helenajr,  implements a Prophet-based hourly forecasting model for ED arrivals. Key design features:

- **Facebook Prophet** with yearly + weekly seasonality and holiday regressors
- **Day-of-week-specific hourly seasonalities**: seven separate conditional Fourier seasonalities (period=1, fourier.order=4), so Monday's hourly profile is learned independently from Saturday's
- **Holiday handling**: UK bank holidays grouped into categories (Christmas ±2 days, New Year's ±1, Easter −1 to +4, generic bank holidays +1), plus a Christmas interim period (28–30 Dec)
- **Pandemic exclusion**: 2020–2022 data nullified to prevent distorted seasonality estimates
- **Daily retraining**: the full model retrains each day and produces a rolling 168-hour (7-day) forecast
- **Baseline comparison**: a rolling 12-week same-day-and-hour average is computed alongside Prophet forecasts to benchmark added value

The model forecasts total ED arrivals (walk-in and ambulance separately), not admitted patients specifically, at hourly resolution.

## Potential advantages over current approach

The current `time_varying_arrival_rates` implementation computes a flat average rate per time-of-day slot across the training set. A time-series forecasting approach could add:

1. **Trend capture**: if admission volumes are gradually rising or falling, a forecasting model will reflect this. Historical means will not unless the training window is manually shortened.
2. **Holiday and special-day effects**: the current arrival rate calculation does not handle bank holidays, the Christmas period, or other calendar effects. These cause large deviations in arrival patterns.
3. **Day-of-week-specific hourly profiles**: the current approach averages across all days of the week for a given time slot. Friday evenings look different from Tuesday evenings.
4. **Recency weighting**: a forecasting model can weight recent observations more heavily than older ones, adapting to changing patterns.

## Key challenges for patientflow

### Subspecialty granularity

In production, arrival rates are needed at subspecialty level (~500 subspecialties). Most subspecialties will have very low arrival volumes — a handful of admissions per week or fewer. Prophet (and most seasonal decomposition methods) needs sufficient data density to estimate Fourier seasonality components reliably. At low volumes, the model would be fitting noise.

Possible approaches:
- **Hierarchical forecasting**: fit Prophet at a higher aggregation level (division, specialty group, or whole-hospital) where there is enough volume for seasonality to be identifiable, then disaggregate using historical subspecialty proportions
- **Hybrid**: use Prophet for high-volume groupings and fall back to historical means for low-volume subspecialties
- **Volume threshold**: only apply time-series forecasting above a minimum daily arrival rate, with a simple average below that

### Time interval resolution

The current system uses 15-minute intervals for the arrival rate array, primarily because the aspirational curve (θ_t) is evaluated at each interval. Moving to hourly intervals would be sufficient — the aspirational curve is smooth enough that 15-minute resolution doesn't add meaningful precision to the θ_t values. This also aligns with the hourly resolution used in the Chester implementation.

### Forecast uncertainty and distributional form

Currently, the arrival rate λ_t is treated as a known constant, and all randomness comes from the Poisson process: actual arrivals ~ Poisson(λ_t). The final bed count distribution reflects Poisson sampling noise only.

If λ_t is replaced with a Prophet point forecast (ŷ_t), this is the same modelling choice — treating the rate as known. In principle, the forecast itself is uncertain (Prophet provides prediction intervals), and a fuller treatment would integrate over possible values of λ_t to produce a mixture distribution with fatter tails. However:

- The current approach already makes this simplification (a historical mean is also an estimate with a standard error, treated as known)
- For the typical arrival rates at individual subspecialties, Poisson variance likely dominates over forecasting uncertainty
- The more important concern is **systematic bias** on unusual days (e.g. Prophet underestimating arrivals on a bank holiday), which would be caught through evaluation against actuals rather than through distributional corrections

### Operational integration

The Chester model retrains daily from scratch on the full history. Options for `patientflow`:
- **Periodic retraining**: retrain the forecasting model on a schedule (daily or weekly), extract forecast rates, and store them in the same format as the current `weights` dict
- **Forecast-as-lookup**: generate a rolling forecast and use it to replace the static rates at prediction time
- **Train-once with update**: fit the model during the standard training pipeline, and use it to produce forward-looking rates rather than historical averages

## Proposed investigation

1. Train Prophet (or an alternative such as `statsforecast`) on historical admitted-patient arrivals at UCLH, at whole-hospital and division level, to assess whether seasonal decomposition produces meaningfully different arrival rate profiles compared to simple averages
2. Evaluate forecast accuracy against actuals, particularly around bank holidays and other calendar effects
3. Test hierarchical disaggregation: forecast at division level, disaggregate to subspecialty using historical proportions, compare against subspecialty-level direct averages
4. If results are promising, design an integration point in `IncomingAdmissionPredictor` that can accept either static historical rates or dynamic forecast-derived rates via a consistent interface

## References

- Countess of Chester ED-Forecast-Short: https://github.com/Countess-of-Chester-Hospital-NHS-FT/ED-Forecast-Short
- Facebook Prophet: https://facebook.github.io/prophet/
- Current patientflow implementation: `patientflow.predictors.incoming_admission_predictors` and `patientflow.calculate.arrival_rates`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time-series forecasting for yet-to-arrive arrival rates (notes on investigation) #141

Context

Reference implementation

Potential advantages over current approach

Key challenges for patientflow

Subspecialty granularity

Time interval resolution

Forecast uncertainty and distributional form

Operational integration

Proposed investigation

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Time-series forecasting for yet-to-arrive arrival rates (notes on investigation) #141

Description

Context

Reference implementation

Potential advantages over current approach

Key challenges for patientflow

Subspecialty granularity

Time interval resolution

Forecast uncertainty and distributional form

Operational integration

Proposed investigation

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions