Skip to content

Docs for ExecutionPlanVisitor #10012

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
matthewmturner opened this issue Apr 9, 2024 · 2 comments · Fixed by #10286
Closed

Docs for ExecutionPlanVisitor #10012

matthewmturner opened this issue Apr 9, 2024 · 2 comments · Fixed by #10286
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@matthewmturner
Copy link
Contributor

Is your feature request related to a problem or challenge?

I've had to write a couple ExecutionPlanVisitors recently (see below) and when I started I initially looked for some documentation on this but wasn't able to find any. I think it would be beneficial to new comers to see a couple examples of ExecutionPlanVisitor in the docs.

Describe the solution you'd like

A section in the docs with some example implementations of ExecutionPlanVisitor

Describe alternatives you've considered

No response

Additional context

These were the ExecutionPlanVisitors I made. I would be happy to add docs around these.

struct FileScanVisitor {
        file_scan_config: Option<FileScanConfig>,
    }

    impl ExecutionPlanVisitor for FileScanVisitor {
        type Error = anyhow::Error;

        fn pre_visit(&mut self, plan: &dyn ExecutionPlan) -> Result<bool, Self::Error> {
            let maybe_parquet_exec = plan.as_any().downcast_ref::<ParquetExec>();
            if let Some(parquet_exec) = maybe_parquet_exec {
                self.file_scan_config = Some(parquet_exec.base_config().clone());
            }
            Ok(true)
        }
    }

    fn get_file_scan_config(plan: Arc<dyn ExecutionPlan>) -> Option<FileScanConfig> {
        let mut visitor = FileScanVisitor {
            file_scan_config: None,
        };
        visit_execution_plan(plan.as_ref(), &mut visitor).unwrap();
        visitor.file_scan_config
    }
#[derive(Debug)]
struct ParquetVisitor;

impl ExecutionPlanVisitor for ParquetVisitor {
    type Error = DataFusionError;

    fn pre_visit(&mut self, plan: &dyn ExecutionPlan) -> Result<bool, Self::Error> {
        // Get the one-line representation of the ExecutionPlan, something like this:
        //   ParquetExec: file_groups=[...], ...
        let mut buf = String::new();
        write!(&mut buf, "{}", displayable(plan).one_line()).map_err(|e| {
            DataFusionError::Internal(format!("Error while collecting metrics: {e}"))
        })?;

        // Trim everything up to the first colon.
        // This is a hack to extract a human-readable representation of the ExecutionPlan's type.
        // We would prefer if `ExecutionPlan` had `name` method, but this will do,
        // since every physical operator seems to follow this convention.
        // If a node doesn't, we just skip collecting its metrics, and no harm is done.
        let plan_type = match buf.split_once(':') {
            None => {
                println!("execution plan has unexpected display format: {buf}");
                return Ok(true);
            }
            Some((name, _)) => name.to_string(),
        };
        let maybe_parquet_exec = plan.as_any().downcast_ref::<ParquetExec>();
        match maybe_parquet_exec {
            Some(parquet_exec) => {
                let metrics = match parquet_exec.metrics() {
                    None => return Ok(true),
                    Some(metrics) => metrics,
                };
                // println!("Metrics: {:?}", metrics);
                let bytes_scanned = metrics.sum_by_name("bytes_scanned");
                println!("Parquet Bytes scanned: {:?}", bytes_scanned);
            }
            None => {

            }
        }
        Ok(true)
    }
}

I had in mind having two examples, one for getting information from parquet files (I could probably combine the two I had) and one that tracked data across all nodes (maybe output_rows).

@matthewmturner matthewmturner added the enhancement New feature or request label Apr 9, 2024
@alamb alamb added the documentation Improvements or additions to documentation label Apr 9, 2024
@alamb
Copy link
Contributor

alamb commented Apr 26, 2024

I added additional documentation on #10035 about treenode

Perhaps we could add an example in https://github.com/apache/datafusion/tree/main/datafusion-examples/examples that shows how to walk over execution plans

similarly to https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/rewrite_expr.rs that shows how to do it for logical exprs / plans 🤔

Something like execution_plan_visit.rs ?

@matthewmturner
Copy link
Contributor Author

@alamb Sure - I can work on it. Although I'm not sure my examples above are even idiomatic with the new API. But would be good to flesh that out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants