DataAnt is an intelligent data analysis tool that combines machine learning with interactive visualization to help analyze and understand your data. It features a web-based UI built with Shiny for Python and integrates with Google's Gemini AI for natural language processing.
- 🤖 AI-powered data analysis using Google's Gemini API
- 📊 Interactive data visualization with Plotly
- 🔍 Advanced machine learning models including:
- Classification (Logistic Regression, SVM, Random Forest, XGBoost)
- Regression (Linear, Ridge, Lasso)
- Clustering (KMeans, DBSCAN)
- 📈 Real-time model monitoring and performance metrics
- 🔐 Secure credential management with encryption
- 🎯 Automated data cleaning and preprocessing
- Clone the repository:
git clone https://github.com/arusatech/dataant.git
cd dataant
- Install required packages:
poetry install
or
poetry export -f requirements.txt --output requirements.txt --without-hashes
pip install -r requirements.txt
- Create a config.json file in the root directory:
{
"db_file": "path/to/your/data.csv",
"api_key": "your-google-api-key"
}
- Start the application:
python main.py -p "your analysis prompt"
Or use a prompt file:
python main.py -f prompt_file.txt
- Command line options:
-p, --prompt Provide analysis prompt directly
-f, --file Provide prompt from a file
-d, --debug Enable debug logging
-t, --template Generate a prompt template file
Example:
(.vdataant) PS C:\dataant> python .\main.py --help
usage: dataant [-h] [-p [PROMPT] | -f [FILE]] [-d] [-t]
options:
-h, --help show this help message and exit
-p [PROMPT], --prompt [PROMPT]
Prompt: Generative Prompt for the Date Analytic bot
-f [FILE], --file [FILE]
Prompt: provided as a file (use -t to get the prompt template)
-d, --debug Debug: Captures debug to the default temp file
-t, --template template: Generative Prompt template file
- Setting up credentials:
python main.py -p "set api_key YOUR_GOOGLE_API_KEY"
python main.py -p "set db_file PATH_TO_YOUR_DATABASE"
- Basic data analysis (heart disease dataset: db_file = "heart.csv" defined in config.json):
analyze the heart disease dataset focusing on age and cholesterol
- List available fields:
list all fields in the dataset
- Specific analysis:
analyze heart disease prediction using age, sex, and bp as features
dataant/
model.py
- Machine learning model implementationsui_app.py
- Shiny web interfaceui_plot.py
- Plotting functionsutil.py
- Utility functionsengine.py
- Core analysis enginedb.py
- Database operations
main.py
- Application entry pointconfig.json
- Configuration file
- API keys and sensitive data are encrypted using PBKDF2 with SHA256
- Credentials are stored securely in config.json
- User-specific encryption using system username and platform
- Python 3.8+
- Pandas
- Scikit-learn
- Plotly
- Shiny for Python
- Google Generative AI
- Cryptography
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
- Interactive data filtering and visualization
- Support for numeric and categorical features
- Dynamic range sliders for numeric fields
- Automatic handling of missing values
- Real-time data updates
- Automated model training with LogisticRegression
- Feature selection and preprocessing
- Training/test split visualization
- Performance metrics calculation
- Model caching for improved performance
- Real-time training time tracking
- Score distribution visualization
- ROC and Precision-Recall curves
- Performance metrics tracking
- Error monitoring and logging
- Interactive form for feature input
- Real-time predictions
- Support for both numeric and categorical inputs
- Clear visualization of prediction results
- User-friendly interface
- Support for multiple database types (dynamoDB, PostgreSQL, MySQL, etc.)
- Note : corresponding db_file should be defined in config.json along with its credentials
- Real-time data synchronization
- Automated schema detection
- Connection pooling
- Query optimization
This project is licensed under the MIT License - see the LICENSE file for details.
- Design and Architecture: Mr. Yakub Mohammad
- © AR USA LLC Team
- Contact: [email protected]
For support, please contact [email protected] or open an issue in the repository.