You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _template_python/INDICATOR_DEV_GUIDE.md
+144-14
Original file line number
Diff line number
Diff line change
@@ -278,7 +278,7 @@ This example is taken from [`hhs_hosp`](https://github.com/cmu-delphi/covidcast-
278
278
279
279
The column is described [here](https://cmu-delphi.github.io/delphi-epidata/api/missing_codes.html).
280
280
281
-
#### Testing
281
+
#### Local testing
282
282
283
283
As a general rule, it helps to decompose your functions into operations for which you can write unit tests.
284
284
To run the tests, use `make test` in the top-level indicator directory.
@@ -411,29 +411,159 @@ Next, the `acquisition.covidcast` component of the `delphi-epidata` codebase doe
411
411
12.`value_updated_timestamp`: now
412
412
2. Update the `epimetric_latest` table with any new keys or new versions of existing keys.
413
413
414
+
Consider what settings to use in the `params.json.template` file in accordance with how you want to run the indicator and acquisition.
415
+
Pay attention to the receiving directory, as well as how you can store credentials in vault.
416
+
Refer to [this guide](https://docs.google.com/document/d/1Bbuvtoxowt7x2_8USx_JY-yTo-Av3oAFlhyG-vXGG-c/edit#heading=h.8kkoy8sx3t7f) for more vault info.
417
+
418
+
### CI/CD
419
+
420
+
* Add module name to the `build` job in `.github/workflows/python-ci.yml`.
421
+
This allows github actions to run on this indicator code, which includes unit tests and linting.
422
+
* Add top-level directory name to `indicator_list` in `Jenkinsfile`.
423
+
This allows your code to be automatically deployed to staging after your branch is merged to main, and deployed to prod after `covidcast-indicators` is released.
424
+
* Create `ansible/templates/{top_level_directory_name}-params-prod.json.j2` based on your `params.json.template` with some adjustment:
Pay attention to the receiving/export directory, as well as how you can store credentials in vault.
429
+
Refer to [this guide](https://docs.google.com/document/d/1Bbuvtoxowt7x2_8USx_JY-yTo-Av3oAFlhyG-vXGG-c/edit#heading=h.8kkoy8sx3t7f) for more vault info.
430
+
414
431
### Staging
415
432
416
-
After developing the pipeline code, but before deploying in development, the pipeline should be run on staging for at least a week.
417
-
This involves setting up some cronicle jobs as follows:
433
+
After developing the pipeline code, but before deploying in development, the pipeline should be tested on staging.
434
+
Indicator runs should be set up to run automatically daily for at least a week.
418
435
419
-
first the indicator run
436
+
The indicator run code is automatically deployed on staging after your branch is merged into `main`.
437
+
After merging, make sure you have proper access to Cronicle and staging server `app-mono-dev-01.delphi.cmu.edu`_and_ can see your code on staging at `/home/indicators/runtime/`.
420
438
421
-
Then the acquisition run
439
+
Then, on Cronicle, create two jobs: one to run the indicator and one to load the output csv files into database.
422
440
423
-
See [@korlaxxalrok](https://www.github.com/korlaxxalrok) or [@minhkhul](https://www.github.com/minhkhul) for more information.
The indicator job loads the location of the relevant csv output files into chained data, which this acquisition job then loads into our database.
428
446
429
-
Note the staging hostname and how the acquisition job is chained to run right after the indicator job.
430
-
Do a few test runs.
447
+
Example script:
431
448
432
-
If everything goes well (check staging db if data is ingested properly), make a prod version of the indicator run job and use that to run indicator on a daily basis.
449
+
```
450
+
#!/usr/bin/python3
433
451
434
-
Another thing to do is setting up the params.json template file in accordance with how you want to run the indicator and acquisition.
435
-
Pay attention to the receiving directory, as well as how you can store credentials in vault.
436
-
Refer to [this guide](https://docs.google.com/document/d/1Bbuvtoxowt7x2_8USx_JY-yTo-Av3oAFlhyG-vXGG-c/edit#heading=h.8kkoy8sx3t7f) for more vault info.
Note the staging hostname in `host` and how the acquisition job is chained to run right after the indicator job.
495
+
496
+
Note that `ind_name` variable here refer to the top-level directory name where code is located, while `acq_ind_name` refer to the directory name where output csv files are located, which corresponds to the name of `source` column in our database, as mentioned in step 3.
497
+
498
+
To automatically run acquisition job right after indicator job finishes successfully:
499
+
500
+
1. In `Plugin` section, select `Interpret JSON in Output`.
501
+
2. In `Chain Reaction` section, select your acquisition run job below to `Run Event on Success`
502
+
503
+
You can read more about how the `chain_data` json object in the script above can be used in our subsequent acquisition job [here](https://github.com/jhuckaby/Cronicle/blob/master/docs/Plugins.md#chain-reaction-control).
504
+
505
+
#### Staging database checks
506
+
507
+
Apart from checking the logs of staging indicator run and acquisition jobs to identify potential issues with the pipeline, one can also check the contents of staging database for abnormalities.
508
+
509
+
At this point, acquisition job should have loaded data onto staging mysql db, specifically the `covid` database.
510
+
511
+
From staging:
512
+
```
513
+
[user@app-mono-dev-01 ~]$ mysql -u user -p
514
+
Enter password:
515
+
Welcome to the MySQL monitor. Commands end with ; or \g.
516
+
Your MySQL connection id is 00000
517
+
Server version: 8.0.36-28 Percona Server (GPL), Release 28, Revision 47601f19
518
+
519
+
Copyright (c) 2009-2024 Percona LLC and/or its affiliates
520
+
Copyright (c) 2000, 2024, Oracle and/or its affiliates.
521
+
522
+
Oracle is a registered trademark of Oracle Corporation and/or its
523
+
affiliates. Other names may be trademarks of their respective
524
+
owners.
525
+
526
+
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
527
+
528
+
mysql> use covid;
529
+
Database changed
530
+
```
531
+
Check `signal_dim` table to see if new source and signal names are all present and reasonable. For example:
532
+
```
533
+
mysql> select * from signal_dim where source='nssp';
Then, check if the number of records ingested in db matches with the number of rows in csv when running locally.
549
+
For example, the below query sets the `issue` date being the day acquisition job was run, and `signal_key_id` correspond with signals from our new source.
550
+
Check if this count matches with local run result.
551
+
552
+
```
553
+
mysql> SELECT count(*) FROM epimetric_full WHERE issue=202425 AND signal_key_id > 816 AND signal_key_id < 825;
554
+
+----------+
555
+
| count(*) |
556
+
+----------+
557
+
| 2620872 |
558
+
+----------+
559
+
1 row in set (0.80 sec)
560
+
```
561
+
562
+
You can also check how data looks more specifically at each geo level or among different signal names depending on the quirks of the source.
563
+
564
+
See [@korlaxxalrok](https://www.github.com/korlaxxalrok) or [@minhkhul](https://www.github.com/minhkhul) for more information.
565
+
566
+
If everything goes well make a prod version of the indicator run job and use that to run indicator on a daily basis.
0 commit comments