add http caching for OTHER metadata endpoint #1333

melange396 · 2023-11-02T03:53:26Z

caching for the one metadata endpoint was done in

http caching for metadata #1222

the other metadata endpoint is used by the covidcast dashboard webapp, so it is quite important to protect it from repeated re-requests which could trigger rate limiting. The code for that endpoint is found here:

delphi-epidata/src/server/endpoints/covidcast.py

Lines 391 to 479 in ea84e37

    
           @bp.route("/meta", methods=("GET", "POST")) 
        
           def handle_meta(): 
        
               """ 
        
               similar to /covidcast_meta but in a structured optimized JSON form for the app 
        
               """ 
        
               filter_signal = parse_source_signal_arg("signal") 
        
               flags = ",".join(request.values.getlist("flags")).split(",") 
        
               filter_smoothed: Optional[bool] = None 
        
               filter_weighted: Optional[bool] = None 
        
               filter_cumulative: Optional[bool] = None 
        
               filter_active: Optional[bool] = None 
        
               filter_time_type: Optional[TimeType] = None 
        
               if "smoothed" in flags: 
        
                   filter_smoothed = True 
        
               elif "not_smoothed" in flags: 
        
                   filter_smoothed = False 
        
               if "weighted" in flags: 
        
                   filter_weighted = True 
        
               elif "not_weighted" in flags: 
        
                   filter_weighted = False 
        
               if "cumulative" in flags: 
        
                   filter_cumulative = True 
        
               elif "not_cumulative" in flags: 
        
                   filter_cumulative = False 
        
               if "active" in flags: 
        
                   filter_active = True 
        
               elif "inactive" in flags: 
        
                   filter_active = False 
        
               if "day" in flags: 
        
                   filter_active = TimeType.day 
        
               elif "week" in flags: 
        
                   filter_active = TimeType.week 
        
               row = db.execute(text("SELECT epidata FROM covidcast_meta_cache LIMIT 1")).fetchone() 
        
               data = loads(row["epidata"]) if row and row["epidata"] else [] 
        
               by_signal: Dict[Tuple[str, str], List[Dict[str, Any]]] = {} 
        
               for row in data: 
        
                   entry = by_signal.setdefault((row["data_source"], row["signal"]), []) 
        
                   entry.append(row) 
        
               user = current_user 
        
               sources: List[Dict[str, Any]] = [] 
        
               for source in data_sources: 
        
                   src = source.db_source 
        
                   if src in sources_protected_by_roles: 
        
                       role = sources_protected_by_roles[src] 
        
                       if not (user and user.has_role(role)): 
        
                           # if this is a protected source 
        
                           # and the user doesnt have the allowed role 
        
                           # (or if we have no user) 
        
                           # then skip this source 
        
                           continue 
        
                   meta_signals: List[Dict[str, Any]] = [] 
        
                   for signal in source.signals: 
        
                       if filter_active is not None and signal.active != filter_active: 
        
                           continue 
        
                       if filter_signal and all((not s.matches(signal.source, signal.signal) for s in filter_signal)): 
        
                           continue 
        
                       if filter_smoothed is not None and signal.is_smoothed != filter_smoothed: 
        
                           continue 
        
                       if filter_weighted is not None and signal.is_weighted != filter_weighted: 
        
                           continue 
        
                       if filter_cumulative is not None and signal.is_cumulative != filter_cumulative: 
        
                           continue 
        
                       if filter_time_type is not None and signal.time_type != filter_time_type: 
        
                           continue 
        
                       meta_data = by_signal.get((source.db_source, signal.signal)) 
        
                       if not meta_data: 
        
                           continue 
        
                       row = meta_data[0] 
        
                       entry = CovidcastMetaEntry(signal, row["min_time"], row["max_time"], row["max_issue"]) 
        
                       for row in meta_data: 
        
                           entry.intergrate(row) 
        
                       meta_signals.append(entry.asdict()) 
        
                   if not meta_signals:  # none found or no signals 
        
                       continue 
        
                   s = source.asdict() 
        
                   s["signals"] = meta_signals 
        
                   sources.append(s) 
        
               return jsonify(sources)

melange396 · 2025-03-19T20:55:30Z

We should do this for the /covidcast/anomalies endpoint too, since its used by the covidcast dashboard webapp and not exempt from rate limiting, it basically never changes, and it looks like it might be kinda expensive (pulls from gdocs, parses that as a csv, then does some pandas manipulations) :

delphi-epidata/src/server/endpoints/covidcast.py

Lines 546 to 556 in 921d0d4

    
           @bp.route("/anomalies", methods=("GET", "POST")) 
        
           def handle_anomalies(): 
        
               """ 
        
               proxy to the excel sheet about data anomalies 
        
               """ 
        
               df = read_csv( 
        
                   "https://docs.google.com/spreadsheets/d/e/2PACX-1vToGcf9x5PNJg-eSrxadoR5b-LM2Cqs9UML97587OGrIX0LiQDcU1HL-L2AA8o5avbU7yod106ih0_n/pub?gid=0&single=true&output=csv", skip_blank_lines=True 
        
               ) 
        
               df = df[df["source"].notnull() & df["published"]] 
        
               return print_pandas(df)

other ~static-ish resources could benefit from this too, but im not sure what they might be off the top of my head.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add http caching for OTHER metadata endpoint #1333

add http caching for OTHER metadata endpoint #1333

melange396 commented Nov 2, 2023

melange396 commented Mar 19, 2025

add http caching for OTHER metadata endpoint #1333

add http caching for OTHER metadata endpoint #1333

Comments

melange396 commented Nov 2, 2023

melange396 commented Mar 19, 2025