This project provides a user interface for explaining the causes of aggregate SQL query results using the CauSumX algorithm.
-
Clone the repository:
git clone https://github.com/nativlevy/causumx cd causumx
-
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Create a
.env
file with the following content:OPENAI_API_KEY=sk-...
Run the Streamlit application:
python -m streamlit run ui/app.py --server.port 8081 --server.address localhost
Open your web browser and navigate to http://localhost:8081
to access the application.
-
Upload the dataset:
- Use the file upload option to select
data/so_countries_col_new.csv
- Use the file upload option to select
-
Upload the DAG file:
- Use the file upload option to select
data/so.dot
- Use the file upload option to select
-
Enter the following SQL query in the text area:
Group by country, calculate mean value and count records
-
Set the actionable attributes: By default, all attributes are considered actionable. You can specify a subset if needed. Example:
Gender, SexualOrientation, EducationParents, RaceEthnicity, Age
-
Click the "Run CauSumX" button to execute the algorithm and view the results.
- Upload your own dataset and DAG file.
- Select from preloaded datasets.
- Enter and execute SQL GROUP-BY queries.
- Visualize causal explanations and interactive bar charts.
main()
: The main function that sets up the Streamlit UI and orchestrates the workflow.dot_to_list(uploaded_file)
: Converts a DOT file to a list representation of the DAG.plot_interactive_bar_chart(data, country_column, value_column, title=None)
: Creates an interactive bar chart using Plotly.
cauSumX(df, DAG, ordinal_atts, targetClass, groupingAtt, fds, k, tau, actionable_atts, high, low)
: The main CauSumX algorithm implementation.
filterPatterns(df, groupingAtt, groups)
: Filters patterns based on grouping attributes.getAllGroups(df_org, atts, t)
: Retrieves all groups from the dataset.getGroupstreatments(DAG, df, groupingAtt, groups, ordinal_atts, targetClass, ...)
: Gets treatments for groups.
getAttsVals(atts,df)
: Gets attribute values from the dataframe.getNextLeveltreatments(treatments_cate, df_g, ordinal_atts, high, low)
: Generates next level treatments.getCombTreatments(df_g, positives, treatments, ordinal_atts)
: Combines treatments.
causumx_output_to_natural_language_explanation(causumx_output)
: Converts CauSumX output to natural language explanations.
-
Ensure you have completed the installation steps mentioned above.
-
Activate your virtual environment:
source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Navigate to the project directory:
cd path/to/causumx
-
Run the Streamlit application:
python -m streamlit run ui/app.py --server.port 8081 --server.address localhost
-
Open your web browser and go to
http://localhost:8081
. -
Using the UI: a. Choose a dataset from the dropdown or upload your own CSV file. b. If using your own dataset, upload a corresponding DAG file (in DOT format). c. Enter a SQL GROUP BY query in the provided text area. d. Adjust the parameters (k and tau) if needed. e. Click the "Run CauSumX" button to execute the algorithm. f. View the results, including causal explanations and interactive visualizations.
- If you encounter any issues with package dependencies, ensure your virtual environment is activated and all packages in
requirements.txt
are installed. - For OpenAI API related errors, check that your
.env
file contains a valid API key.
Contributions to improve the UI or extend the functionality of CauSumX are welcome. Please submit pull requests or open issues on the GitHub repository.
Original Paper: https://dl.acm.org/doi/10.1145/3639328