A powerful open-source library that enables natural language conversations with your data using Large Language Models (LLMs). Transform complex data analysis into simple conversations - no coding required!
- π£οΈ Natural Language Interface: Ask questions about your data in plain English
- π Multiple Model Support: Works with OpenAI GPT models and Groq's high-speed inference
- π§ Smart Analysis: Automatic code generation, data cleaning, and ML model suggestions
- π οΈ Error Recovery: Built-in debugging and error correction mechanisms
- π Auto-Visualization: Generates charts and graphs automatically
- πΎ SQL Support: Native support for SQLite databases
- π Report Generation: Create comprehensive analysis reports automatically
- π Data Quality Analysis: Identifies and fixes data quality issues
- β‘ Real-time Processing: Streaming responses for immediate feedback
pip install insightai
Set up your API keys:
# Required: OpenAI API Key
export OPENAI_API_KEY="your-openai-api-key"
# Required: Groq API Key (for faster inference)
export GROQ_API_KEY="your-groq-api-key"
import pandas as pd
from insightai import InsightAI
# Load your data
df = pd.read_csv('your_data.csv')
# Initialize InsightAI
ai = InsightAI(df)
# Start asking questions!
ai.pd_agent_converse("What are the main trends in this data?")
import pandas as pd
from insightai import InsightAI
# Load sales data
df = pd.read_csv('sales_data.csv')
ai = InsightAI(df)
# Interactive mode - ask multiple questions
ai.pd_agent_converse()
# Now you can ask: "Show me monthly revenue trends"
# Or: "Which product category has the highest profit margin?"
# Ask a specific question
ai = InsightAI(df)
ai.pd_agent_converse("What is the correlation between price and customer rating?")
# Analyze SQLite database
ai = InsightAI(db_path='customer_database.db')
ai.pd_agent_converse("Find the top 10 customers by total purchase amount")
# Generate comprehensive analysis report
ai = InsightAI(df, generate_report=True, report_questions=5)
ai.pd_agent_converse() # Generates a full report automatically
# Get data cleaning recommendations and ML model suggestions
ai = InsightAI(df)
ai.pd_agent_converse("Clean this dataset and suggest appropriate machine learning models")
InsightAI(
df=None, # pandas DataFrame
db_path=None, # Path to SQLite database
max_conversations=4, # Conversation memory length
debug=False, # Enable debug mode
exploratory=True, # Enable exploratory analysis
df_ontology=False, # Enable data ontology support
generate_report=True, # Auto-generate reports
report_questions=5 # Number of questions for reports
)
Create LLM_CONFIG.json
in your working directory:
[
{
"agent": "Code Generator",
"details": {
"model": "gpt-4o",
"provider": "openai",
"max_tokens": 4000,
"temperature": 0
}
},
{
"agent": "Planner",
"details": {
"model": "llama-3.3-70b-versatile",
"provider": "groq",
"max_tokens": 2000,
"temperature": 0.1
}
}
]
Create PROMPT_TEMPLATES.json
to customize agent behavior:
{
"planner_system": "You are a data analysis expert...",
"code_generator_system_df": "You are an AI data analyst..."
}
- "What does this dataset contain?"
- "Show me the distribution of values in each column"
- "Are there any missing values or outliers?"
- "What's the correlation between sales and marketing spend?"
- "Perform a statistical summary of the numerical columns"
- "Which factors most influence customer satisfaction?"
- "Create a bar chart of revenue by product category"
- "Plot the trend of monthly sales over time"
- "Show me a correlation heatmap of all numerical variables"
- "Clean this dataset and prepare it for machine learning"
- "Handle missing values and suggest the best approach"
- "Identify and fix data quality issues"
- "What machine learning models would work best for this data?"
- "Prepare this data for predictive modeling"
- "Suggest features for predicting customer churn"
- "Generate a comprehensive analysis report"
- "What are the key business insights from this data?"
- "Create an executive summary of the findings"
InsightAI automatically saves visualizations to the visualization/
folder:
- Bar charts, line plots, scatter plots
- Correlation heatmaps
- Distribution plots
- Custom business charts
Generate professional markdown reports including:
- Executive summary
- Dataset overview
- Key findings and insights
- Recommendations
- Supporting visualizations
View the actual Python code generated for your analysis:
# Example generated code
import pandas as pd
import matplotlib.pyplot as plt
# Calculate monthly revenue trends
monthly_revenue = df.groupby('month')['revenue'].sum()
plt.figure(figsize=(10, 6))
plt.plot(monthly_revenue.index, monthly_revenue.values)
plt.title('Monthly Revenue Trends')
plt.savefig('visualization/monthly_revenue_trends.png')
plt.show()
InsightAI uses a multi-agent architecture with specialized AI agents:
- Expert Selector: Chooses the right agent for your task
- Data Analyst: Performs statistical analysis and visualizations
- SQL Analyst: Handles database queries and operations
- Data Cleaning Expert: Identifies and fixes data quality issues
- Code Generator: Creates Python code for your analysis
- Error Corrector: Debugs and fixes code issues automatically
- Report Generator: Creates comprehensive analysis reports
- GPT-4o, GPT-4o-mini
- GPT-4 Turbo
- O1 series models
- Llama 3.3 70B
- Llama 3.1 8B
- Mixtral 8x7B
- Gemma 2 9B
All interactions are automatically logged with detailed cost tracking:
{
"chain_id": "1234567890",
"agent": "Code Generator",
"model": "gpt-4o-mini",
"tokens_used": 1500,
"cost": 0.03,
"duration": "2.3s"
}
View logs in: insightai_consolidated_log.json
- Input sanitization and validation
- Code execution sandboxing
- Blacklisted dangerous operations
- Rate limiting and error handling
# Analyze online store data
df = pd.read_csv('ecommerce_data.csv')
ai = InsightAI(df)
ai.pd_agent_converse("Which products have the highest return rate and why?")
# Stock market analysis
ai = InsightAI()
ai.pd_agent_converse("Download Apple stock data for 2024 and analyze the trends")
# Patient data analysis (anonymized)
df = pd.read_csv('patient_outcomes.csv')
ai = InsightAI(df)
ai.pd_agent_converse("What factors correlate with better patient outcomes?")
git clone https://github.com/LeoRigasaki/InSightAI.git
cd InsightAI
pip install -e ".[dev]"
- Dynamic API Key Management: Only requires API keys for providers you actually use
- Flexible Provider Support: Mix and match OpenAI, Groq, and Gemini models freely
- Cost Optimization: Reduced overhead by eliminating unused API dependencies
- Smarter LLM configuration parsing
- Better error messages for missing API keys
- Enhanced provider validation
- Fixed requirement for all API keys even when not needed
- Improved initialization error handling
We welcome contributions! Please see our Contributing Guidelines for details.
- Fork the repository
- Create a feature branch:
git checkout -b feature-name
- Commit changes:
git commit -am 'Add feature'
- Push to branch:
git push origin feature-name
- Submit a Pull Request
- Token limits vary by model (check your plan)
- Large datasets may require chunking
- Rate limiting depends on your API plan
- Complex visualizations may need manual adjustment
MIT License - see LICENSE for details.
- Special thanks to pgalko for the original inspiration
- OpenAI for providing powerful language models
- Groq for high-performance inference capabilities
- The open-source community for continuous improvements
- π§ Email: [email protected]
- π Issues: GitHub Issues
- π‘ Feature Requests: GitHub Discussions
Transform your data analysis workflow today with InsightAI - where natural language meets powerful analytics! π