-
Notifications
You must be signed in to change notification settings - Fork 2
fix: global code update #225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
-updated code to query postgres database - fixed random bugs
…rs/soil-id-algorithm into fix/global-code-update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please:
- fix the highlighted issues
- add some test lat/lng points
- provide a copy of the SQL data(base) needed for testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 🎉 🎉 big milestone, congrats & thank you so much! left a couple very minor comments while skimming, and some general comments:
- fantastic that it has the exact same API as the US version
- we'll need to get the US tests working again (not sure why they're not working 🤔) and add some smoke tests for global before merging
- it seems the global code relies on having a database available at runtime. we'll need to figure out how to get that working from static files like US does
- or if that's somehow completely impossible, there'll be a lift to get that DB set up in our dev, CI, and prod environments
- we'll need a way to determine whether to call the US or global functions
soil_id/us_soil.py
Outdated
@@ -2060,7 +2060,6 @@ def rank_soils( | |||
Score_Data_Loc = (D_final_loc["Score_Data_scale"] + D_final_loc["distance_score_scale"]) / ( | |||
D_final_loc["data_weight"] + location_weight | |||
) | |||
Score_Data_Loc /= np.nanmax(Score_Data_Loc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: why this small change to us_soil.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this change shouldn't be in this branch and should be ignored
|
||
# Call soildgrids API | ||
|
||
def sg_list(lon, lat): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: rename for clarity
def sg_list(lon, lat): | |
def soil_grids_list(lon, lat): |
898207a
to
d9ba6ab
Compare
Co-authored-by: Paul Schreiber <[email protected]>
…rs/soil-id-algorithm into fix/global-code-update
hi @jjmaynard! i had a chance to set up the bulk testing harness today! here's some questions and what i found from a first pass of running it over the test data you sent (
question: once we get it running successfully, which column of the test CSV should i compare against the output soil name to determine how well the algorithm ranked the result? there are four columns that seem like they could be relevant:
and lastly here are the 3 distinct tracebacks i found from the algorithm crashes:
|
fixed the 3 traceback issues @shrouxm identified. code now runs under 2s. SoilGrids API returns are sometimes taking 20-30 s.
Co-authored-by: Paul Schreiber <[email protected]>
fixed the 3 traceback issues @shrouxm identified. code now runs under 2s. SoilGrids API returns are sometimes taking 20-30 s.
-fixed distance calculation errors associated with coordinate system transformation. -added top depth input to rank function
When no map data is available at a location - returns "Data_unavailable"
-modified sql code to query all home mapunit components and all adjacent map units and components. -fixed component name indexing.
…requirements.txt requirements/dev.in setup.cfg from main
6c9d5d5
to
d54c625
Compare
layer_name=None, | ||
buffer_size=0.5, | ||
) | ||
def list_soils_global(lon, lat): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: this should be called list_soils() to match the us version. this module is namespaced and doesn't need a prefix.
|
||
############################################################################################## | ||
# rankPredictionGlobal # | ||
############################################################################################## | ||
def rankPredictionGlobal( | ||
lon, lat, soilHorizon, horizonDepth, rfvDepth, lab_Color, bedrock, cracks, plot_id=None | ||
def rank_soils_global( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: this should be called rank_soils() to match the us version. this module is namespaced and doesn't need a prefix.
@shrouxm when you list the postgis table indexes, do all of these appear?
I was able to run your sql query after creating a new spatial index and adding 'geom' to the selected columns:
Here are the results using your version: Garo's SQL query version
Here are the results from a modified version of my previous sql query that uses the more complicated spatial filter (wkt polygon buffer: Jon's revised SQL query version
|
@jjmaynard i have both those indexes yeah :/ ok well i will stop beating the query time horse until i'm able to run it on our staging DB, since i'm not sure how to debug further why some of these queries are much slower on my machine and i'm willing to just move on from that if it runs ok everywhere else so i guess the main thing to look into at this point is the high missingness |
@shrouxm Not sure if you've already tried running these functions but can help to improve performance of existing index:
|
format global test dataset gitignore global testing
…rs/soil-id-algorithm into fix/global-code-update
@jjmaynard ooh i've never even heard of the postgres cluster command! i will try it out |
@jjmaynard i think the only thing we need to get this merged is to fix the US tests that got broken at some point along the way in this PR, are you able to look into that? |
Description
Updated global functions: