An interactive Shiny application for demonstrating how decision trees are trained step by step. Upload a CSV dataset, pick your target and predictors, and explore how split candidates are evaluated, how impurity changes, and what the resulting tree looks like.
- CSV upload with flexible parsing – toggle header handling and delimiter options before loading your data.
- Dynamic variable selection – choose the target column and include/exclude predictors on the fly.
- Classification or regression – automatically picks the correct
rpartmode or let you override it. - Interactive training stepper – scrub through each split in the fitted tree to inspect node sizes and impurity metrics.
- Candidate split analytics – inspect primary vs. competing splits, visualize threshold gain curves, and review summary tables.
- Decision tree visualization – render the tree with
rpart.plot, including class probabilities or regression summaries.
- R 4.0 or higher
- The app relies on the following packages (installed automatically on first run via
pacman):shinyrpartandrpart.plotdplyrandpurrrggplot2DT
If pacman is missing it will be installed for you, but you can pre-install everything manually:
install.packages(c("pacman", "shiny", "rpart", "rpart.plot", "dplyr", "purrr", "ggplot2", "DT"))- Clone this repository or download the source.
- Open an R session in the project directory.
- Launch the Shiny app:
shiny::runApp("app.R")The application will start in your browser. Any console messages about package installation will appear in the R session.
├── app.R # Shiny UI + server logic for the demo
└── README.md # Project documentation (this file)
- Upload a tidy CSV file with one target column and one or more predictor columns.
- Toggle the Header row checkbox if your file contains column names.
- Adjust the Separator option when working with semicolon- or tab-delimited files.
- Pick the Target variable and at least one Predictor variable.
- Choose the tree method (auto/classification/regression), maximum depth, and minimum split size to suit your dataset.
- Walk through each split with the Training step slider to inspect impurity changes and sample coverage.
- Classification mode requires at least two target classes; regression mode needs a numeric target.
- Rows containing missing values in the selected variables are dropped before training—watch the notification area for counts.
- Categorical predictors are automatically converted to factors; logical columns become two-level factors.
The app is a single-file Shiny deployment with several helper utilities:
calc_impurity()computes Gini impurity for factors and variance for numeric targets.build_tree_steps()extracts per-split metrics (size, impurity, candidate rules) from the fittedrpartmodel, powering the stepper UI.compute_split_curve()generates gain curves across possible thresholds for numeric features.- The UI combines tabbed outputs for data preview, tree visualization, and candidate analysis with slider-driven interactivity.