A lightweight Python library for generating realistic temporary datasets for testing and development. Generate 40+ different dataset types including business, financial, IoT, healthcare, and technology data!
- 40+ Dataset Types: Business, financial, IoT sensors, healthcare, social media, and technology datasets
- Zero Dependencies: Works with just Python standard library
- Multiple Formats: Generate CSV, JSON, or in-memory datasets
- Realistic Data: Built-in faker integration with realistic patterns
- Technology Focus: New datasets for DevOps, monitoring, web analytics, and system performance
- Memory Efficient: Optimized for large dataset generation
- Python 3.7+: Compatible with modern Python versions
pip install tempdataset
import tempdataset
# Generate datasets
data = tempdataset.create_dataset('sales', 1000)
tech_data = tempdataset.create_dataset('web_analytics', 500)
server_metrics = tempdataset.create_dataset('server_metrics', 200)
# Save directly to files
tempdataset.create_dataset('sales_data.csv', 1000)
tempdataset.create_dataset('api_logs.json', 500)
# Get help and see all available datasets
tempdataset.list_datasets() # Show all 40+ datasets
tempdataset.help() # Quick help guide
## π Available Datasets (40 Total)
TempDataset provides **40 comprehensive datasets** across 6 major categories:
### π’ Core Business Datasets (10)
- **`sales`** - Sales transactions and orders (27 columns)
- **`customers`** - Customer profiles and demographics (31 columns)
- **`ecommerce`** - E-commerce transactions and reviews (35+ columns)
- **`employees`** - Employee records and HR data (30+ columns)
- **`marketing`** - Marketing campaigns and performance (32+ columns)
- **`retail`** - Retail store operations (28+ columns)
- **`suppliers`** - Supplier management data (22+ columns)
- **`crm`** - Customer relationship management (30+ columns)
- **`inventory`** - Inventory and warehouse data (25+ columns)
- **`reviews`** - Product and service reviews (15+ columns)
### π° Financial Datasets (8)
- **`stocks`** - Stock market trading data (20+ columns)
- **`banking`** - Banking transactions (20+ columns)
- **`cryptocurrency`** - Cryptocurrency trading (20+ columns)
- **`insurance`** - Insurance policies and claims (20+ columns)
- **`loans`** - Loan applications and management (20+ columns)
- **`investments`** - Investment portfolios (20+ columns)
- **`accounting`** - General ledger and accounting (20+ columns)
- **`payments`** - Digital payment processing (25+ columns)
### π Technology Datasets (8) β *NEW!*
- **`web_analytics`** - Website analytics and traffic (17 columns)
- **`app_usage`** - Mobile app usage analytics (15 columns)
- **`system_logs`** - System and application logs (11 columns)
- **`api_calls`** - API calls and performance (12 columns)
- **`server_metrics`** - Server performance monitoring (22 columns)
- **`user_sessions`** - User session tracking (20 columns)
- **`error_logs`** - Application error logs (16 columns)
- **`performance`** - Application performance monitoring (21 columns)
### π₯ Healthcare Datasets (6)
- **`patients`** - Patient medical records (22 columns)
- **`appointments`** - Medical appointments (14 columns)
- **`lab_results`** - Laboratory test results (13 columns)
- **`prescriptions`** - Medication prescriptions (16 columns)
- **`medical_history`** - Patient medical history (11 columns)
- **`clinical_trials`** - Clinical trial data (14 columns)
### οΏ½οΈ IoT Sensor Datasets (6)
- **`weather`** - Weather sensor monitoring (18 columns)
- **`energy`** - Smart meter energy data (14 columns)
- **`traffic`** - Traffic sensor monitoring (15 columns)
- **`environmental`** - Environmental monitoring (17 columns)
- **`industrial`** - Industrial sensor data (16 columns)
- **`smarthome`** - Smart home IoT devices (16 columns)
### π± Social Media Datasets (2)
- **`social_media`** - Social media posts and engagement (16 columns)
- **`user_profiles`** - Social media user profiles (17 columns)
### π Quick Examples
```python
# Generate different types of datasets
sales = tempdataset.create_dataset('sales', 1000)
tech_logs = tempdataset.create_dataset('system_logs', 500)
health_data = tempdataset.create_dataset('patients', 200)
crypto = tempdataset.create_dataset('cryptocurrency', 300)
# Get help and list all datasets
tempdataset.list_datasets() # Show all 40 datasets
tempdataset.help() # Quick reference guide
data = tempdataset.create_dataset('sales', 1000)
# Basic operations
data.head(10) # First 10 rows
data.tail(5) # Last 5 rows
data.describe() # Statistical summary
data.info() # Data info
# Filtering and selection
filtered = data.filter(lambda row: row['amount'] > 100)
selected = data.select(['customer_name', 'amount', 'date'])
# Export options
data.to_csv('output.csv')
data.to_json('output.json')
data.to_dict() # Convert to dictionary
import tempdataset
# Generate data
data = tempdataset.create_dataset('sales', 10000)
# Check performance stats
stats = tempdataset.get_performance_stats()
print(f"Generation time: {stats['generation_time']:.2f}s")
print(f"Memory usage: {stats['memory_usage']:.2f}MB")
# Reset stats for next operation
tempdataset.reset_performance_stats()
# Clone the repository
git clone https://github.com/dot-css/TempDataset.git
cd TempDataset
# Install development dependencies
pip install -e .[dev]
# Run tests
pytest
# Run tests with coverage
pytest --cov=tempdataset
# Run performance benchmarks
pytest .benchmarks/
# Run all tests
pytest
# Run specific test categories
pytest -m "not slow" # Skip slow tests
pytest -m integration # Only integration tests
pytest -m performance # Only performance tests
# Run with coverage report
pytest --cov=tempdataset --cov-report=html
# Format code
black tempdataset tests
# Lint code
flake8 tempdataset tests
# Type checking
mypy tempdataset
Generate temporary datasets or save to files.
Parameters:
dataset_type
(str): Dataset type or filename- Available types:
'sales'
,'customers'
,'ecommerce'
,'employees'
,'marketing'
,'retail'
,'suppliers'
- File formats:
'sales.csv'
,'customers.json'
, etc.
- Available types:
rows
(int): Number of rows to generate (default: 500)
Returns:
TempDataFrame
containing the generated data (also saves to file if filename provided)
Display comprehensive help information about all available datasets, including column descriptions, usage examples, and feature details.
Get a quick overview of all available datasets with their key features and column counts.
Read CSV file into TempDataFrame.
Read JSON file into TempDataFrame.
head(n=5)
: Get first n rowstail(n=5)
: Get last n rowsdescribe()
: Statistical summaryinfo()
: Dataset informationfilter(func)
: Filter rows by functionselect(columns)
: Select specific columnsto_csv(filename)
: Export to CSVto_json(filename)
: Export to JSONto_dict()
: Convert to dictionary
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Run the test suite
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
See CHANGELOG.md for a detailed history of changes.
- Documentation: https://tempdataset.readthedocs.io/
- Issue Tracker: https://github.com/dot-css/TempDataset/issues
- Discussions: https://github.com/dot-css/TempDataset/discussions
- Built with love for the Python testing community
- Inspired by the need for lightweight, dependency-free test data generation
- Thanks to all contributors who help make this project better!