Skip to content

Conversation

@AngRodrigues
Copy link
Member

@AngRodrigues AngRodrigues commented Jan 14, 2025

Description

added data_checks, a series of functions that check the main datatypes in map2loop: Geology, Structure, Faults and Folds. Most of the code was refactored to avoid code repetition between the different functions, but checks that are valid for one single datatype are kept in the same functions

Also changed the config dictionary data checks into this file, and the functionality of not allowing invalid args into project (from issue 162), as it made sense to have these fixes together. Please check PR #171 for more info on this.

Fixes #162
Fixes #138

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Test improvement

How Has This Been Tested?

I am attaching 3 notebooks that show some of the common warnings and errors.
Geology - https://gist.github.com/AngRodrigues/a273380a082d024669699d13fa99a0f8
Structure - https://gist.github.com/AngRodrigues/7e7c5b9bb178cb89f1f815c3392db417
Fault - https://gist.github.com/AngRodrigues/6db7a0eca5421b90f4c79ae6021f69f6

Also checked this with the notebooks from source data and Loop server data and it seems to work well.

Branch: fix/add_data_checks

AngRodrigues and others added 30 commits December 17, 2024 13:29
@AngRodrigues AngRodrigues mentioned this pull request Jan 14, 2025
2 tasks
@AngRodrigues AngRodrigues marked this pull request as ready for review January 14, 2025 05:18
Copy link
Member

@lachlangrose lachlangrose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks pretty good, my only main comment is to not duplicate the logging. I think having the warnings/errors in the check functions and then raise the exception in the mapdata is good.

I would also consider moving the notebooks into the documentation examples because it probably would be useful for people to be able to review this.

Geopandas make_valid could be used before checking for validity. Or maybe as a configuration option?
@AngRodrigues let me know what changes you want to make and i can do the rest.

if geodata[id_column].isnull().any():
error_msg = (
f"Datatype {geodata_name}: Column '{id_column}' "
f"(config key: '{id_config_key}') contains non-numeric or NaN values. "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good, we could make the error message even better by providing the strings (or some of) for the values that couldn't be coerced

Comment on lines 707 to 715
# else:
# warning_msg = (
# f"Datatype {datatype_name.upper()}: Optional column '{column_name}' "
# f"(config key: '{config_key}') is missing from the data. "
# )
####### this might be taking it a bit too far

# logger.info(warning_msg)
# warnings.append(warning_msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think keep that as an info log, but not as a warning. Good to have the extra information avaliable if we need it

elif datatype == Datatype.FAULT:
validity_check, message = check_fault_fields_validity(mapdata = self)
if validity_check:
logger.error(f"Datatype FAULT - data validation failed: {message}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if there is too much duplication in the error messages. The validity checker prints out nice error messages. I think it would be enough to just raise the exception Datatype FAULT - data validation failed

Comment on lines +257 to +258
"bounding_box": bounding_box,
"working_projection": working_projection, # this may be removed when fix is added for https://github.com/Loop3D/map2loop/issues/103
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could/should we calculate the bounding box from the extent of the shapefiles?

@lachlangrose lachlangrose merged commit 743a5db into master May 2, 2025
2 checks passed
@lachlangrose lachlangrose deleted the fix/add_data_checks branch May 2, 2025 01:53
noellehmcheng pushed a commit that referenced this pull request May 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] project allows invalid arguments [Bug] - map2loop crashes if required fields contain null

3 participants