From a9cb9179d4c89534c4a21060aaba6a0d8ef8dc66 Mon Sep 17 00:00:00 2001 From: Ryan Tibshirani Date: Fri, 29 Oct 2021 22:22:02 -0400 Subject: [PATCH] Remove response doc --- indicators/paper/responses-to-reviews.tex | 302 ---------------------- 1 file changed, 302 deletions(-) delete mode 100644 indicators/paper/responses-to-reviews.tex diff --git a/indicators/paper/responses-to-reviews.tex b/indicators/paper/responses-to-reviews.tex deleted file mode 100644 index 2dc6922..0000000 --- a/indicators/paper/responses-to-reviews.tex +++ /dev/null @@ -1,302 +0,0 @@ -\documentclass[11pt]{article} - -\usepackage[T1]{fontenc} -\usepackage{geometry} -\usepackage{microtype} - -\usepackage{newtxmath} -\usepackage{newtxtext} - -\title{Response to peer reviews} -\author{} -\date{\today} - -\begin{document} -\maketitle - -\section*{Editor} - -\begin{quote} - The reviewers both saw high value in the database and regarded the paper as - important and relevant. Reviewer 2 had several high-level comments about the - current framing of the results and offered a number of suggestions for the - authors to consider additional or modified analyses that would strengthen the - paper. I would like to invite the authors to submit a revised version of the - manuscript including a detailed point-by-point response to the reviewer - comments, particularly where Reviewer 2 has suggested need for either further - analyses or pointed to notable limitations. While the authors should use their - discretion regarding whether additional analyses would be within scope for the - revision, it will be important to respond to the suggestions in either case - and at least to expand on limitations as relevant in the revised manuscript. -\end{quote} -We thank the editor and reviewers for their comments and the opportunity to make -revisions. We have made a number of revisions, described in detail below; in -summary, these revisions include: -\begin{itemize} -\item Additional graphics in the Supplementary Information, and additional - references to it where appropriate, to provide additional useful ways to view - the data and comparisons to outcomes other than COVID-19 cases - -\item Clarification of limitations, both in our data and in other publicly - available data sources, so the strengths and weaknesses of our data compared - to other sources are clearer - -\item Improved analyses in the Results section, including more pertinent figures - and clearer examples. -\end{itemize} -We believe these changes have strengthened the manuscript, and describe them in -more detail below. - -\section*{Reviewer 1} - -\begin{quote} - In section 1.B, you mention non-trivial signal processing and direct the - reader to github to learn more. I would like to see more discussion of how - each source was processed. If this is not possible due to length constraints, - where can the reader find more details? Is there documentation at the Delphi - website that explains how the data from each source is processed? In a quick - search at the Delphi website, I was not able to find information about how the - survey data (for example) were weighted. These details will be important to - those using the data. -\end{quote} -Thanks, this is an important point. We have added a sentence to Section 1.B to -make clear that detailed documentation, including such details as the survey -weighting procedures and the day-of-week adjustments used in other data sources, -is available on the public documentation site. - -\section*{Reviewer 2} - -\begin{quote} - This paper describes a diverse public database of signals related to COVID-19 - prevalence, including symptom survey data, mobility, at-home tests, and - hospitalizations. The database is unique, important, and useful, especially in - the current surveillance-limited context of the United States. However, the - main analyses presented in the results section did not feel cohesive, and the - ``story'' of the paper felt incomplete. While I respect the challenge of - dividing work among multiple related papers, I think that a key goal of - revisions might be to add further independent analyses of the data described - that emphasize its utility beyond comparisons to cases or add additional - analyses to build a unique cohesive story for this paper. -\end{quote} -We appreciate this point and agree the message may have gotten split among -manuscripts unnecessarily. We have made some changes to the Results section -accordingly. Most notably, we have replaced Figure 6 with one that more clearly -shows a potential use case of the data presented beyond forecasting of cases; it -now shows how data sources in the API could have aided in planning for the COVID -vaccination campaign. We have also introduced a new figure (discussed below) to -make the value of revision data more clear, and adjusted language throughout. - -\begin{quote} - The main value of the data discussed in the paper seems to focus on its - potential utility in forecasting, which guides the main comparisons to - observed case burden and primary outcome of ``signal value'' in results and in - the appendix. However, there are a number of important limitations to the - results presented that impede this message. - - Most concrete results on this point seem to be presented in another paper. - Correlations in the paper are focused on current case burden (signal vs. cases - both at time t), rather than prediction for forecasts. - - Even from a nowcasting perspective, comparisons are all benchmarked against - cases which are observable in real-time. -\end{quote} -We have extended the analysis to include comparisons against hospitalizations as -well; see below. - -But it is worth emphasizing that cases are not actually observable in real time. -Case reporting typically occurs with lag; lag was often high early in the -pandemic when testing was limited and health departments were struggling to -scale up reporting systems, so case reports for a particular date may not have -been available until several days or a week later. This gradually changed until -most states reported cases daily---though these reports were still subject to -lag and referred to cases that may have been diagnosed several days earlier. -Now, some states have returned to a weekly reporting schedule, so case reports -only become available once a week, and so cases may take up to seven days to be -reported. - -When our correlation analyses show a high correlation between a signal and case -reports on a specific date, that signal may have been available several days -before the case reports were, depending on the time this occurred during the -pandemic and the state's reporting practices. Meanwhile, signals such as -CTIS-CLI-in-community can be reported within 1--2 days of the survey responses. -High correlations with case and death data hence indicate a high correlation -with a case value that may have only been officially reported days after these -signals. - -But this point is somewhat subtle; ideally, one would like to correlate our -surveys against the number of new infections occurring on a date, rather than -the number of cases reported on that date. These could be very different, since -infections occur days or weeks before cases are reported, but ground-truth -infection data is unavailable. - -Instead, we have added a new figure to illustrate a related point. Because we -track revisions of signals, we can see how often officially reported case and -death data is revised, and how long these revisions can occur after initial -reporting. The new figure (now Figure 6) shows that case and death data can be -subject to quite large revisions even 30--60 days after initial reporting, -making it important to have supplementary data sources to check against. - -\begin{quote} - The strength of observed correlations change (non-monotonically) over time, - and the implications of this are not discussed. -\end{quote} -While it is difficult to fully explain the changes in correlation---they are -likely due to a variety of changes in reporting practices, health-seeking -behavior, survey response patterns, and so on---one key point is that the -strength of the correlation depends on the overall differences between counties. -That is, if counties have very similar COVID case rates, it is difficult to -achieve a high Spearman correlation between case rates and signals, because -getting the rank correct is difficult. If counties have widely varying COVID -case rates, a signal that provides information about cases will be able to -achieve a higher Spearman correlation. - -We have slightly expanded our discussion of Figure 2 to make this point, though -we have tried to keep the discussion brief so as not to sidetrack the main -discussion of the manuscript. We have also added a plot to the Supplementary -Information showing correlations between lagged cases and cases, showing that -similar variations happen in those correlations; this suggests that they reflect -changes in the underlying distribution of cases. - -\begin{quote} - Signal strength is compared only to cases (rather than downstream outcomes - like hospitalization or death). -\end{quote} -We have extended the Supplementary Information to include more detailed -comparisons between our signals and reported COVID hospitalizations (as compiled -by the Department of Health and Human Services), including comparisons of -correlations. These show broadly similar results as the COVID case analyses. -Section 2.B now points to the Supplement for interested readers to examine these -analyses. - -\begin{quote} - The case studies of signal utility that are provided seem stylized. - Policymakers would *know* in real-time when backlogged cases are reported and - can adjust case numbers accordingly or when cases are not being reported due - to holiday backlogs. -\end{quote} -Unfortunately this is not the case: Backlogs and reporting delays are generally -poorly understood and can happen unexpectedly, and users of these datasets often -have to make informed guesses about changes. For instance, many backlogs are due -to retrospective audits of records that uncover cases and deaths that were -incorrectly classified, and can affect case and death estimates months in the -past. Until the audits are complete, policymakers would not know about the -corrections, their magnitude, or the effects they'd have on policy decisions. -The new Figure 6 illustrates this by showing the magnitude of revisions to case -and death reports. - -Even for predictable backlogs, such as those that occur during major holidays, -the backlog still means policymakers do not have an important signal: If case -and death reporting is suspended for the holiday, knowing that the suspension is -due to the holiday does not help a policymaker judge whether cases are -increasing or decreasing and whether urgent action is required. Outside data -sources would be needed to inform that decision. - -We have adjusted the language in Section 2.C to make these points clearer. - -\begin{quote} - As a result, it might be useful to focus quantitative analyses in this paper - on other attributes of the data or expand analyses to include, e.g.: - \begin{itemize} - \item descriptive statistics from each data source, expressed in the units of - that data source - \item analysis of how signals differ - \item analysis of case correlations over time due to discussion of how they - have evolved over time -- e.g. due to changes in vaccination, testing, etc. - \item for case studies: I think it would be more powerful to either highlight - an instance issues were not apparent in the absence of other signals or to - provide more in depth analysis about a specific policy conclusion that would - result from use of these signals (or specific examples of - geographic/temporal variation that the signal data can uniquely inform). - \end{itemize} - A compelling framing for the results section might be to include general - descriptive statistics and then pick case studies that illustrate 2-3 uses of - the data. -\end{quote} -We appreciate the interest in additional analyses, and our replacement of Figure -6 (with the new Figure 7) is partly intended to address this concern about -choice of case studies. It represents an example showing how these data sources -can reveal points not available in other standard data sources, such as case and -death data, and serve to aid public health decision-making. - -We have chosen not to include additional descriptive statistics or comparisons -between signals for two reasons. First, given the large number of signals -available in our API, a table of descriptive statistics would be prohibitively -large; this is particularly difficult if one accounts for time and geography by -presenting statistics over time or across space. Second, it's not clear to us -what a reader would gain from such statistics. Because the data is freely -available, a reader interested in any specific signal can quickly investigate it -and explore whatever aspects of it are of interest. - -Our Supplementary Information does include additional exploratory analyses of -the signals, including state- and county-level time series plots that reveal -interesting signal behavior. - -\begin{quote} - I think that descriptions in results of ``what the data could be used for'' - are less compelling than more specific examples would be. For instance, the - section on revised estimates over time seems to echo what was in the methods - and implications don't really hit home without connection to changes in - specific forecasts. -\end{quote} -We have extended the discussion of revisions, as mentioned above, to give -another illustration of why having revision information for external data -sources is valuable. Given the challenges of COVID forecasting, it did not seem -in-scope for this paper to explore revision information and its impact on -forecasting in great depth. We have also replaced the example in Section 2.E -(now Figure 7) to be more topical and illustrate the unique value of the data -more strongly. - -\begin{quote} - Some of the verb tenses (e.g. line 33) feel awkward: ``The Delphi Group - **works** (worked) with partner organizations and34 public data sets to build - a massive database of indicators35 tracking COVID-19 activity and other - relevant phenomena36 in the United States, which has been publicly available - and37 continuously updated since April 2020.'' -\end{quote} -We have tweaked this sentence and a few others, though we note that the present -tense is accurate in many cases, because data collection and aggregation is -ongoing. - -\begin{quote} - It would be helpful to end the introduction with a description of the - contribution of this paper, rather than other papers. -\end{quote} -We have adjusted the introduction to conclude with the contribution of this -paper, to set it in context against the other papers. - -\begin{quote} - I think it would be useful to replace ``massive'' with a more precise - adjective unless it is standard in the literature. -\end{quote} -We have slightly adjusted our language here. We still use ``massive'' to refer -to our online surveys, in line with other uses of the term, such as ``massive -open online courses;'' we also feel this reflects its truly unprecedented scale, -as the survey is the largest survey ever conducted (except for censuses). - -\begin{quote} - Similarly, the verb ``ingest'' also feels odd to refer to data sources. -\end{quote} -``Ingest'' is commonly used in the data science community to refer to the -process of extracting data and loading it into a central database; but we agree -this use of the term may not be widespread among our audience, so we have -adjusted the language to avoid using field-specific jargon. - -\begin{quote} - I'm not sure that it is common practice to repeatedly refer to a sponsoring - research group throughout an academic article. -\end{quote} -We have adjusted the language used throughout, so references to the research -group are limited to where they are most useful to the reader. - -\begin{quote} - Descriptions of statistical analyses (e.g. correlation methods) feel more - appropriate for the methods section, not the results section. -\end{quote} -We are open to changes if the Editor feels they're necessary to match typical -\textit{PNAS} style, but we feel that in our paper, it makes the most sense for -the Methods to describe the methods underlying the data we gather and aggregate, -since that data is the main focus of the article. The Results focus on -illustrations of the data's utility, including their correlations with other -COVID indicators. - -\end{document}