Skip to content

[BUG] Show actual detailed error message on dq failure #250

@menathan

Description

@menathan

Describe the bug

failed = SparkExpectationsValidateRules.validate_expectations(
df=_df,
rules=rules,
spark=self.spark,
)
if failed:
# Optionally, raise or log details for each failed rule
failed_rules = [r.get("rule") for rules_list in failed.values() for r in rules_list]
raise SparkExpectationsMiscException(f"Validation failed for rules: {failed_rules}")

does not show the actual error messages raised at for example

try:
tree = sqlglot.parse_one(expectation)
agg_funcs = list({node.key for node in tree.find_all(sqlglot.expressions.AggFunc)})
except Exception as e:
raise SparkExpectationsInvalidRowDQExpectationException(
f"[row_dq] Could not parse expression: {expectation}{e}"
)
if agg_funcs:
raise SparkExpectationsInvalidRowDQExpectationException(
f"[row_dq] Rule '{rule.get('rule')}' contains aggregate function(s) (not allowed in row_dq): {agg_funcs}"
)
try:
df.select(expr(expectation)).limit(1)
except Exception as e:
raise SparkExpectationsInvalidRowDQExpectationException(
f"[row_dq] Rule failed validation | rule_type: row_dq | "
f"rule: '{rule.get('rule')}' | expectation: '{expectation}' → {e}"
)

To Reproduce
Steps to reproduce the behavior:

  1. Add row_dq rule, add a "COUNT(*)" in the rule
  2. Run SE

Expected behavior
We should get the actual error back, not just which rule failed. It is impossible to debug this way, which is bad developer experience.

Screenshots
N/A

Desktop (please complete the following information):

  • OS: [e.g. iOS] macos
  • Browser [e.g. chrome, safari] N/A
  • Version [e.g. 22] N/A

Additional context
Spark Expectations 2.7.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions