Skip to content

Commit

Permalink
added simple models to Q1
Browse files Browse the repository at this point in the history
  • Loading branch information
jreps committed May 13, 2020
1 parent feed8a9 commit b9c3777
Show file tree
Hide file tree
Showing 66 changed files with 361 additions and 40 deletions.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
54 changes: 45 additions & 9 deletions Covid19PredictingHospitalizationInFluPatients/data/settings.csv

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions Covid19PredictingHospitalizationInFluPatients/global.R
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,6 @@ if(inputType == 'file' & !is.null(validation)){

summaryTable <- getSummary(result, inputType, validation)




Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
<h3>Description</h3>
<p> This button provides information about the data used in the app </p>


Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
<h3>Description</h3>
<p> Information about the study and links to the code used to run the study </p>

5 changes: 5 additions & 0 deletions Covid19PredictingHospitalizationInFluPatients/html/Help.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<h3>Description</h3>
<p> This button provides a link to a YouTube video with a demonstration of the shiny app </p>



4 changes: 4 additions & 0 deletions Covid19PredictingHospitalizationInFluPatients/html/Log.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
<h3>Description</h3>
<p> This button shows the log when the model was developed or validated </p>


10 changes: 10 additions & 0 deletions Covid19PredictingHospitalizationInFluPatients/html/Model.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
<h3>Description</h3>
<p> The model button shows a plot and table with the characteristics of the patients with and without the outcome during the time-at-risk. </p>

<h3>Interpretation</h3>
<ul>
<li>The plots show each covariate as a dot (binary covariates on the left side plot and measurements on the right side plot). The x-axis is the fraction of patients (or mean value) with the covariate in the patients without the outcome and the y-axis is the fraction of patients (or mean value) with the covariate in the patients with the outcome. Dots above the x=y line are more common in patients with the outcome and dots below the line are more common in patients without the outcome.</li>
<li>The table shows the covariate name, the variable importance or coefficient value, the mean value in those with and without the outcome and the standardized mean difference.</li>
</ul>


Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
<h3>Description</h3>
<p> The performance of the model including the operating characteristics at different risk cutoffs, the overall discrimination and the overall calibration. </p>

<h3>Tabs</h3>
<p>The three tabs are:</p>
<ul>
<li>The 'Summary' tab shows the prediction question being explores and various operating characteristics for a range of risk cutoffs (the threshold bar is interactive and enables you to explore different values by moving the bar left or right) </li>
<li>The 'Discrimination' tab shows the AUROC, AUPRC, predicted risk distributions and F1 score</li>
<li>The 'Calibration' tab shows the generic calibration plot and the calibration per age group and gender.</li>
</ul>
25 changes: 25 additions & 0 deletions Covid19PredictingHospitalizationInFluPatients/html/Summary.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
<h3>Description</h3>
<p>A table showing summary information for each validation result. Each row corresponds to a model applied to a specific target population, outcome and time-at-risk triple for a specific database. Summary details include the validation data size (target population and outcome counts) and discriminative performance. </p>

<h3>Options</h3>
<p>Click on a row to select it - this will show as the row will be highlighted. This will populate the following parts of the app for further exploration:</p>
<ul>
<li>The complete performance of the result for the selected row can be viewed by clicking on the 'Performance' button in the left menu</li>
<li>The model corresponding to the result for the selected row can be viewed by clicking on the 'Model' button in the left menu</li>
<li>The log file corresponding to the result for the selected row can be viewed by clicking on the 'Log' button in the left menu (this is not always available)</li>
<li>The model development settings for the selected row can be viewed by clicking on the 'Model Settings' tab in the top menu (this is not always available)</li>
<li>The population settings (information about time-at-risk and exclusions) for the selected row can be viewed by clicking on the 'Population Settings' tab in the top menu (this is not always available)</li>
<li>The covariate settings (information about the model features) for the selected row can be viewed by clicking on the 'Covariate Settings' tab in the top menu (this is not always available)</li>
</ul>

<h3>Using the Filter</h3>
<p> Select a specific: </p>
<ul>
<li>development database - database used to develop the model being validated</li>
<li>validation database - database used to evaluate the model being validated</li>
<li>time-at-risk - time period relative to index where the outcome is being predicted</li>
<li>target population - the patient population we are interested in predicting the outcome risk for </li>
<li>outcome - the event being predicted</li>
<li>model - the type of model (e.g., logistic regression, decision tree)</li>
</ul>
<p>to filter the table rows of interest. </p>
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
<h3>Description</h3>
<p> These plots show the box plots displaying the risk distributions for those with the outcome during the time-at-risk (class 1) and those without the outcome during the time-at-risk (class 0) </p>

<h3>Interpretation</h3>
<p> If a model is able to discriminate between those with and without the outcome then it should be assigning a higher risk to those with the outcome, so the box plot for class 1 should be shifted to the right relative to the box plot for class 0. If the model is not able to discriminate then the box plots will look similar. </p>

Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
<h3>Description</h3>
<p> The calibration plots show how closely the predicted risk matched the true observed risk. The calibration plot is calculated (using labelled data) by partitioning the patients into deciles based on predicted risk and then within each decile the mean predicted risk is calculated and the fraction of patients with the outcome (the observed risk) is calculated. The calibration plot is then generated by plotting the observed risk against the mean predicted risk for each decile. </p>

<h3>Interpretation</h3>
<p> If a model is well calibrated the mean predicted risk should be approximately the same as the observed risk. Therefor all 10 dots should fall on the x=y line. If the dots fall above the x=y line then there is a higher oberved risk than predicted, so our model is assigning lower than the true risk to patients (underestimated risk). If the dots fall below the x=y line then there is a lower observed risk than predicted, so our model is assigning higher than the true risk to patients (overestimated risk).</p>

Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<h3>Description</h3>
<p> The demographic calibration plots show how closely the predicted risk matched the true observed risk for each age/gender strata. We partition the patients into age and gender groups, then calculate the mean predicted risk within each age/gender group and the fraction of patients within the group that actually had the outcome during the time-at-risk (observed risk). We then plot the observed and predicted risk for each age group split by gender. </p>

<h3>Interpretation</h3>
<p> If a model is well calibrated the mean predicted risk should be approximately the same as the observed risk for each age/gender. Therefore, the observed risk and predicted risk plots should overlap. If there is deviation between the predicted risk and observed risk for a certain age group, then this tells us the model is not well calibrated for that age group. This may indicate the need to fit a model specifically for that age group if there is sufficient data.</p>

<p> In addition, this plot shows us the age trend of risk (e.g., you can see whether the risk increases as patients age) and it shows us how males and females differ in terms of risk of the outcome during the time-at-risk.</p>

13 changes: 13 additions & 0 deletions Covid19PredictingHospitalizationInFluPatients/html/f1Help.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
<h3>Description</h3>
<p> The F1 score plot shows the F1 score for each risk threshold. Click <a href="https://en.wikipedia.org/wiki/F1_score">here</a> for more information about the F1 score. </p>

<h3>Interpretation</h3>
<p> The F1-score combines the sensitivity and precision of the model into a single measure of accuracy.</p>

<h3>Definitions</h3>
<ul>
<li> Sensitivity - probability that somebody with the outcome will be identified as having the outcome by the model at a specified cutoff (e.g., their predicted risk >= specified cutoff)
</li>
<li> Precision (positive predictive value) - probability that somebody identified by the model as having the outcome at a specified cutoff truly has the outcome
</li>
</ul>
19 changes: 19 additions & 0 deletions Covid19PredictingHospitalizationInFluPatients/html/prcHelp.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
<h3>Description</h3>
<p> The precision recall (PR) curve shows the trade-off between precision (positive predictive value) and recall (sensitivity) for all possible risk cutoffs. The area below the curve is a measure of overall discriminative performance. Click <a href="https://acutecaretesting.org/en/articles/precision-recall-curves-what-are-they-and-how-are-they-used">here</a> for more information. </p>

<h3>Interpretation</h3>
<p> The red dashed line shows the fraction of the target population who have the outcome (the average risk). The main line shows the relationship between the precision and recall. If the main line is above the red dashed line, then this means the model is able to identify a group of patients who have a higher risk than the average risk, the higher the line is above the red dashed line, the higher the relative risk we can identify for some subset of patients.
</p>

<h3>Notes</h3>
<p> If the outcome is rare (so the data are imbalanced) a precision recall curve (PRC) gives an insight into the clinical utility of the model as it tells you about the precision of the model </p>

<h3>Definitions</h3>
<ul>
<li> Sensitivity (recall) - probability that somebody with the outcome will be identified as having the outcome by the model at a specified cutoff (e.g., their predicted risk >= specified cutoff)
</li>
<li>
Specificity - probability that somebody without the outcome will be identified as a non-outcome by the model at a specified cutoff (e.g., their predicted risk < specified cutoff) </li>
<li> Precision (positive predictive value) - probability that somebody identified by the model as having the outcome at a specified cutoff truly has the outcome
</li>
</ul>
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
<h3>Description</h3>
<p> These plots show the probability density function for those with the outcome (red) and those without the outcome (green) </p>

<h3>Interpretation</h3>
<p> If a prediction model is able to discriminate between those who and without the outcome during the time-at-risk then these distributions should be disjoint. The more overlap between the distributions, the worse the discrimination. </p>

Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
<h3>Description</h3>
<p> These plots show the preference score density function for those with the outcome (red) and those without the outcome (green) </p>

<h3>Interpretation</h3>
<p> If a prediction model is able to discriminate between those who and without the outcome during the time-at-risk then these distributions should be disjoint. The more overlap between the distributions, the worse the discrimination. </p>

18 changes: 18 additions & 0 deletions Covid19PredictingHospitalizationInFluPatients/html/rocHelp.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
<h3>Description</h3>
<p> The receiver operating characteristic (ROC) curve shows the trade-off between sensitivity and specificity for all possible risk cutoffs. The area below the curve is a measure of overall discriminative performance. Click <a href="https://en.wikipedia.org/wiki/Receiver_operating_characteristic">here</a> for more information. </p>

<h3>Interpretation</h3>
<p> If a model is not able to discriminate then the curve will be approximately the x=y line. A perfectly discriminative model will go up vertically and then across.</p>

<h3>Notes</h3>
<p> If the outcome is rare then the ROC curve doesn't provide insight into the precision of the model and a precision recall curve (PRC) should also be inspected. </p>

<h3>Definitions</h3>
<ul>
<li> Sensitivity - probability that somebody with the outcome will be identified as having the outcome by the model at a specified cutoff (e.g., their predicted risk >= specified cutoff)
</li>
<li>
Specificity - probability that somebody without the outcome will be identified as a non-outcome by the model at a specified cutoff (e.g., their predicted risk < specified cutoff) </li>
<li> Precision (positive predictive value) - probability that somebody identified by the model as having the outcome at a specified cutoff truly has the outcome
</li>
</ul>
17 changes: 13 additions & 4 deletions Covid19PredictingHospitalizationInFluPatients/processing.R
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,18 @@ getSummary <- function(result,inputType,validation){
sumTab <- summaryPlpAnalyses(result)
}

sumTab <- sumTab[,c('analysisId','devDatabase','valDatabase','cohortName','outcomeName','modelSettingName','riskWindowStart', 'riskWindowEnd', 'AUC','AUPRC', 'populationSize','outcomeCount','incidence',
#remove empty rows
emptyInd <- is.na(sumTab[,'AUC'])
if(sum(emptyInd)>0){
sumTab <- sumTab[!emptyInd,]
}

#sumTab <- sumTab[,c('analysisId','devDatabase','valDatabase','cohortName','outcomeName','modelSettingName','riskWindowStart', 'riskWindowEnd', 'AUC','AUPRC', 'populationSize','outcomeCount','incidence',
# 'addExposureDaysToStart','addExposureDaysToEnd','plpResultLocation', 'plpResultLoad')]
#colnames(sumTab) <- c('Analysis','Dev', 'Val', 'T', 'O','Model', 'TAR start', 'TAR end', 'AUC','AUPRC', 'T Size','O Count','O Incidence (%)', 'addExposureDaysToStart','addExposureDaysToEnd', 'plpResultLocation', 'plpResultLoad')
sumTab <- sumTab[,c('devDatabase','valDatabase','cohortName','outcomeName','modelSettingName','riskWindowStart', 'riskWindowEnd', 'AUC','AUPRC', 'populationSize','outcomeCount','incidence',
'addExposureDaysToStart','addExposureDaysToEnd','plpResultLocation', 'plpResultLoad')]
colnames(sumTab) <- c('Analysis','Dev', 'Val', 'T', 'O','Model', 'TAR start', 'TAR end', 'AUC','AUPRC', 'T Size','O Count','O Incidence (%)', 'addExposureDaysToStart','addExposureDaysToEnd', 'plpResultLocation', 'plpResultLoad')
colnames(sumTab) <- c('Dev', 'Val', 'T', 'O','Model', 'TAR start', 'TAR end', 'AUC','AUPRC', 'T Size','O Count','O Incidence (%)', 'addExposureDaysToStart','addExposureDaysToEnd', 'plpResultLocation', 'plpResultLoad')

return(sumTab)
}
Expand Down Expand Up @@ -118,8 +127,8 @@ summaryPlpAnalyses <- function(analysesLocation){
valSettings <- settings[,c('analysisId','modelSettingsId', 'cohortName', 'outcomeName',
'populationSettingId','modelSettingName','addExposureDaysToStart',
'riskWindowStart', 'addExposureDaysToEnd',
'riskWindowEnd')]
valSettings$devDatabase <- settings$devDatabase[1]
'riskWindowEnd','devDatabase')]
#valSettings$devDatabase <- settings$devDatabase[1]
valPerformance <- merge(valSettings,
valPerformance, by='analysisId')
valPerformance <- valPerformance[,colnames(devPerformance)] # make sure same order
Expand Down
88 changes: 87 additions & 1 deletion Covid19PredictingHospitalizationInFluPatients/server.R
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,28 @@ server <- shiny::shinyServer(function(input, output, session) {

# need to remove over columns:
output$summaryTable <- DT::renderDataTable(DT::datatable(summaryTable[filterIndex(),!colnames(summaryTable)%in%c('addExposureDaysToStart','addExposureDaysToEnd', 'plpResultLocation', 'plpResultLoad')],
rownames= FALSE, selection = 'single'))
rownames= FALSE, selection = 'single',
extensions = 'Buttons', options = list(
dom = 'Bfrtip', buttons = I('colvis')
),

container = htmltools::withTags(table(
class = 'display',
thead(
#tags$th(title=active_columns[i], colnames(data)[i])
tr(apply(data.frame(colnames=c('Dev', 'Val', 'T','O', 'Model',
'TAR start', 'TAR end', 'AUC', 'AUPRC',
'T Size', 'O Count', 'O Incidence (%)'),
labels=c('Database used to develop the model', 'Database used to evaluate model', 'Target population - the patients you want to predict risk for','Outcome - what you want to predict',
'Model type','Time-at-risk start day', 'Time-at-risk end day', 'Area under the reciever operating characteristics (test or validation)', 'Area under the precision recall curve (test or validation)',
'Target population size of test or validation set', 'Outcome count in test or validation set', 'Percentage of target population that have outcome during time-at-risk')), 1,
function(x) th(title=x[2], x[1])))
)
))

)
)


selectedRow <- shiny::reactive({
if(is.null(input$summaryTable_rows_selected[1])){
Expand Down Expand Up @@ -267,4 +288,69 @@ server <- shiny::shinyServer(function(input, output, session) {
)
})





# HELPER INFO
showInfoBox <- function(title, htmlFileName) {
shiny::showModal(shiny::modalDialog(
title = title,
easyClose = TRUE,
footer = NULL,
size = "l",
shiny::HTML(readChar(htmlFileName, file.info(htmlFileName)$size) )
))
}


observeEvent(input$DescriptionInfo, {
showInfoBox("Description", "html/Description.html")
})
observeEvent(input$SummaryInfo, {
showInfoBox("Summary", "html/Summary.html")
})
observeEvent(input$PerformanceInfo, {
showInfoBox("Performance", "html/Performance.html")
})
observeEvent(input$ModelInfo, {
showInfoBox("Model", "html/Model.html")
})
observeEvent(input$LogInfo, {
showInfoBox("Log", "html/Log.html")
})
observeEvent(input$DataInfoInfo, {
showInfoBox("DataInfo", "html/DataInfo.html")
})
observeEvent(input$HelpInfo, {
showInfoBox("HelpInfo", "html/Help.html")
})


observeEvent(input$rocHelp, {
showInfoBox("ROC Help", "html/rocHelp.html")
})
observeEvent(input$prcHelp, {
showInfoBox("PRC Help", "html/prcHelp.html")
})
observeEvent(input$f1Help, {
showInfoBox("F1 Score Plot Help", "html/f1Help.html")
})
observeEvent(input$boxHelp, {
showInfoBox("Box Plot Help", "html/boxHelp.html")
})
observeEvent(input$predDistHelp, {
showInfoBox("Predicted Risk Distribution Help", "html/predDistHelp.html")
})
observeEvent(input$prefDistHelp, {
showInfoBox("Preference Score Distribution Help", "html/prefDistHelp.html")
})
observeEvent(input$calHelp, {
showInfoBox("Calibration Help", "html/calHelp.html")
})
observeEvent(input$demoHelp, {
showInfoBox("Demographic Help", "html/demoHelp.html")
})


})
Loading

0 comments on commit b9c3777

Please sign in to comment.