Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1085 Ecocounter Open Data Schema+DAG+QC plots #1096

Open
wants to merge 82 commits into
base: master
Choose a base branch
from

Conversation

gabrielwol
Copy link
Collaborator

@gabrielwol gabrielwol commented Nov 8, 2024

What this pull request accomplishes:

Does not need significant review:

  • R plots

  • Misc.

    • Add counter to sites table (commonly used ID field in Ecocounter communications/dashboard)
      • updated ecocounter_pull.py, pull_data_from_api.py.
    • Add direction_main field to flows: easily group dominant/contraflow together
    • Fixed anomalous_range join conditions (Use only one of site_id and flow_id)
      • counts, counts_calibrated

Issue(s) this solves:

What, in particular, needs to reviewed:

What needs to be done by a sysadmin after this PR is merged

  • Delete counts_old (saved the old version of counts for monitoring project dependency)
  • Delete discontinuities table

@gabrielwol gabrielwol self-assigned this Nov 8, 2024
@gabrielwol gabrielwol linked an issue Nov 8, 2024 that may be closed by this pull request
Copy link
Member

@radumas radumas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only commenting on the OpenData schema.

Looking pretty solid.

To summarize some change requests made elsewhere:

  • There should be README updates in here for documentation relevant to the unit.
  • the VIEWs the DAG are pulling from could be in open_data and named to include Miovision data. When Miovision is ready to publish, we can replace the VIEWs to have those UNIONs without having to change the DAG
  • the unit of location should be the linear direction (directional centreline) to
    • handle technologies having different centrelines for each direction, or
    • handle one side of the street being faulty; and,
    • avoid the public having to join multiple location tables.
    • And we don't expose the "parent" site/sensor to the public.

volumes/ecocounter/views/create-view-open_data_sites.sql Outdated Show resolved Hide resolved
volumes/ecocounter/views/create-view-open_data_sites.sql Outdated Show resolved Hide resolved
volumes/ecocounter/views/create-view-open_data_sites.sql Outdated Show resolved Hide resolved
@gabrielwol gabrielwol marked this pull request as ready for review November 21, 2024 21:53
@@ -106,46 +96,93 @@ def reminder_message(ds = None, **context):
wait_till_10th.doc_md = """
Wait until the 10th day of the month to export data. Alternatively mark task as success to proceed immediately.
"""

@task()
def get_years(ds=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we add this for mapped task naming? It was using ds before right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the insert_and_download_data task_group is mapped over the output of get_years. The purpose is to run the exports for last two months (which may be two separate years), since new data frequently arrives within ~30 days.
image

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK to remove sensitivity_changes since it's superseded by sensitivity_history.
Not sure why flow_id = 101042942 has a stray date_range = (,) .
The readme should mention:

  • sensitivity_history is manually updated based on emails with with Eco-counter Technical Support Specialist (Derek Yates as of 2024).
  • what the numbered settings mean (least to most selective, or not standardized)
  • Eco-counter must confirm what the current and new settings are whenever a sensitivity is adjusted, along with the date the change is applied.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK to remove sensitivity_changes since it's superseded by sensitivity_history.

Not sure why flow_id = 101042942 has a stray date_range = (,) .

That flow_id never had any data. Deleted the entry.

The readme should mention...

Adding!

Copy link

@A-DUYVESTYN A-DUYVESTYN Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to separate scooter flows calibration factors from bike flows? Currently we apply the same "total volume" factor to both, but we will have separate validation factors for these veh types soon. (Not a blocker for this release)
Readme should state:

  • ecocounter.calibration_factors excludes factors in ecocounter.manual_counts_info with flagged as do_not_use= true . do_not_use is entered manually on a case-by-case basis depending on which calibration study is the 'best'.
  • Caution is required when assigning do_not_use to calibration studies. Avoid as much as possible applying a new factor to data that has already been calibrated and published to Open Data.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way to separate scooter calibration factors would be to edit ecocounter.validation_results, particularly the flow_ids CTE to link different flow_ids to those studies.

@A-DUYVESTYN
Copy link

@gabrielwol, I'm not finding this file that my review was requested for: volumes/ecocounter/views/create-view-counts_calibrated.sql

@gabrielwol
Copy link
Collaborator Author

@gabrielwol, I'm not finding this file that my review was requested for: volumes/ecocounter/views/create-view-counts_calibrated.sql

Renamed to "counts" from counts_calibrated

@A-DUYVESTYN
Copy link

@gabrielwol, I'm not finding this file that my review was requested for: volumes/ecocounter/views/create-view-counts_calibrated.sql

Renamed to "counts" from counts_calibrated

In bigdata, I see both VIEWs counts_calibrated and counts. Is the intention to only have counts which would contain calibrated counts? And to get raw data, use table counts_unfiltered?

@gabrielwol
Copy link
Collaborator Author

In bigdata, I see both VIEWs counts_calibrated and counts. Is the intention to only have counts which would contain calibrated counts? And to get raw data, use table counts_unfiltered?

@A-DUYVESTYN yes, we would keep only counts and counts_unfiltered.

  • counts contains both raw and calibrated volumes, and omits anomalous ranges and unvalidated flows/sitese.
  • counts_unfiltered contains all raw data.

Will work on readmes today.

@A-DUYVESTYN
Copy link

OK. I'm not clear on process for assigning flows_unfiltered.validated and sites_unfiltered.validated. Another one for documentation.

@gabrielwol gabrielwol force-pushed the 1085-ecocounter-develop-sensitivityfactor-tables branch from 5bbc065 to c4aa662 Compare January 9, 2025 21:02
@gabrielwol gabrielwol force-pushed the 1085-ecocounter-develop-sensitivityfactor-tables branch from c4aa662 to ee775d3 Compare January 9, 2025 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ecocounter: develop sensitivity/factor tables
4 participants