Skip to content

add http caching for OTHER metadata endpoint #1333

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
melange396 opened this issue Nov 2, 2023 · 1 comment
Open

add http caching for OTHER metadata endpoint #1333

melange396 opened this issue Nov 2, 2023 · 1 comment

Comments

@melange396
Copy link
Collaborator

caching for the one metadata endpoint was done in

the other metadata endpoint is used by the covidcast dashboard webapp, so it is quite important to protect it from repeated re-requests which could trigger rate limiting. The code for that endpoint is found here:

@bp.route("/meta", methods=("GET", "POST"))
def handle_meta():
"""
similar to /covidcast_meta but in a structured optimized JSON form for the app
"""
filter_signal = parse_source_signal_arg("signal")
flags = ",".join(request.values.getlist("flags")).split(",")
filter_smoothed: Optional[bool] = None
filter_weighted: Optional[bool] = None
filter_cumulative: Optional[bool] = None
filter_active: Optional[bool] = None
filter_time_type: Optional[TimeType] = None
if "smoothed" in flags:
filter_smoothed = True
elif "not_smoothed" in flags:
filter_smoothed = False
if "weighted" in flags:
filter_weighted = True
elif "not_weighted" in flags:
filter_weighted = False
if "cumulative" in flags:
filter_cumulative = True
elif "not_cumulative" in flags:
filter_cumulative = False
if "active" in flags:
filter_active = True
elif "inactive" in flags:
filter_active = False
if "day" in flags:
filter_active = TimeType.day
elif "week" in flags:
filter_active = TimeType.week
row = db.execute(text("SELECT epidata FROM covidcast_meta_cache LIMIT 1")).fetchone()
data = loads(row["epidata"]) if row and row["epidata"] else []
by_signal: Dict[Tuple[str, str], List[Dict[str, Any]]] = {}
for row in data:
entry = by_signal.setdefault((row["data_source"], row["signal"]), [])
entry.append(row)
user = current_user
sources: List[Dict[str, Any]] = []
for source in data_sources:
src = source.db_source
if src in sources_protected_by_roles:
role = sources_protected_by_roles[src]
if not (user and user.has_role(role)):
# if this is a protected source
# and the user doesnt have the allowed role
# (or if we have no user)
# then skip this source
continue
meta_signals: List[Dict[str, Any]] = []
for signal in source.signals:
if filter_active is not None and signal.active != filter_active:
continue
if filter_signal and all((not s.matches(signal.source, signal.signal) for s in filter_signal)):
continue
if filter_smoothed is not None and signal.is_smoothed != filter_smoothed:
continue
if filter_weighted is not None and signal.is_weighted != filter_weighted:
continue
if filter_cumulative is not None and signal.is_cumulative != filter_cumulative:
continue
if filter_time_type is not None and signal.time_type != filter_time_type:
continue
meta_data = by_signal.get((source.db_source, signal.signal))
if not meta_data:
continue
row = meta_data[0]
entry = CovidcastMetaEntry(signal, row["min_time"], row["max_time"], row["max_issue"])
for row in meta_data:
entry.intergrate(row)
meta_signals.append(entry.asdict())
if not meta_signals: # none found or no signals
continue
s = source.asdict()
s["signals"] = meta_signals
sources.append(s)
return jsonify(sources)

@melange396
Copy link
Collaborator Author

We should do this for the /covidcast/anomalies endpoint too, since its used by the covidcast dashboard webapp and not exempt from rate limiting, it basically never changes, and it looks like it might be kinda expensive (pulls from gdocs, parses that as a csv, then does some pandas manipulations) :

@bp.route("/anomalies", methods=("GET", "POST"))
def handle_anomalies():
"""
proxy to the excel sheet about data anomalies
"""
df = read_csv(
"https://docs.google.com/spreadsheets/d/e/2PACX-1vToGcf9x5PNJg-eSrxadoR5b-LM2Cqs9UML97587OGrIX0LiQDcU1HL-L2AA8o5avbU7yod106ih0_n/pub?gid=0&single=true&output=csv", skip_blank_lines=True
)
df = df[df["source"].notnull() & df["published"]]
return print_pandas(df)

other ~static-ish resources could benefit from this too, but im not sure what they might be off the top of my head.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant