Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Regression model to hardcode Exp or Linear Regression #302

Open
grahamjenson opened this issue Jun 20, 2024 · 5 comments
Open

Change Regression model to hardcode Exp or Linear Regression #302

grahamjenson opened this issue Jun 20, 2024 · 5 comments

Comments

@grahamjenson
Copy link

grahamjenson commented Jun 20, 2024

Take with a grain of salt.

I was looking at the way you select the regression models for use in predictive alerts and smart snooze features and think that it is not a good idea to rank the models based on the training data used to create them. It looks like with 2 data points it will always select a Linear model, and with 3 it will always select a Quadratic. The problem is quadratic has a very bad RMSE and tendency to the extreme readings.

I wrote a post about it here were I was exploring the data to see what models were the best. https://maori.geek.nz/problems-with-predicting-blood-glucose-with-regression-571377170b8b

The good thing is that if you want to take my suggestion, it is an easy fix by changing this one line from BestMatchRegression to ExpRegression or PolynomialRegression(degree: 1).
https://github.com/nightscout/nightguard/blob/ab4a3878cf68b58d504b0bd5456151a7532dc4db/nightguard/external/PredictionService.swift#L138C13-L138C52

@dhermanns
Copy link
Collaborator

Thanks for your input. @florianpreknya any opinion on that one?

@poml88
Copy link
Contributor

poml88 commented Jun 20, 2024

Hi, I would like to remind in this context of the issue with the time intervals of the values. For some common sensors this is 5 minutes, so using the last 2-3 values for whatever will be 10-15 minutes.
Some other sensors provide a value every minute. Using 2-3 last values here is 2-3 minutes, so there might be a lot of noise.
For the latter I would rather go for some average of the last 3 5 minutes interval or pick the 2-3 values by age, like 5 minutes old, 10 minutes old, ...

@grahamjenson
Copy link
Author

Just out of curiosity, what sensors give minute readings?

Also, I am not sure what effect getting minute by minute readings would have on selection. I think the overall problem is that currently it is using training data to select a curve. If in 10 minutes you get 10 readings, a better option might be to select last 5 to train with, then use the previous 5 to rate the selection on, measuring based on past prediction. I think actually checking real numbers would be the only way to say for sure which strategy is best.

@poml88
Copy link
Contributor

poml88 commented Jun 20, 2024

Well, yes, I do not know how all this prediction works, I have not looked at that and it seems complicated. :) Maybe somebody more qualified than me could think about it...

For example the Freestyle Libre from Abbott provides minute readings.

@motinis
Copy link

motinis commented Oct 30, 2024

So I arrived here since we've been having a bit of an issue since switching to the G7 I've seen it's more jittery than G6 was. So sometimes there will be a big drop (e.g. -27 mg/dL), and then the next one will balance it out (e.g. +5 mg/dL). I've been feeling that the low prediction is a bit overly sensitive to the latest datapoint considering this behavior, but I wasn't familiar with how the model works (and now I've taken a look). I do agree that quadratic and sqrt models seem definitely very odd to me here. The best way to check something like this is with out of band data (like was done in the link referenced by grahamjenson in the issue - although there is a minor problem that this is n=1). I would think it should be fairly simple to use an online regret minimization approach here though. Treat each model as a separate expert which provides a prediction and then each expert is penalized according to its loss, resulting in a weight for each expert. This also would allow us to tune the loss function - for example, we should penalize predictions that were too high more than those that were too low. We likely should also weigh the loss by the value of the actual BG in some way, for example by dividing by the sqrt of that BG, so that we aim to optimize for better accuracy the lower the actual BG turns out to be.

Future enhancements could involve using a more advanced model (like those cited in the table at the bottom of the link referenced).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants
@grahamjenson @dhermanns @motinis @poml88 and others