This repository contains a collection of scripts and utilities designed to assist with the migration to Fabric, covering various workloads.
New! Complete migration guide with Python/Bash scripts (no PowerShell required):
- 📘 Migration Guide - Comprehensive step-by-step guide
- 📚 ETL Library Documentation - Complete API reference for Migration Scripts ETL Library
- 🔄 Data Type Mapping - Handle datatype differences between platforms
- 🔐 Permissions Guide - Setup all required permissions
- ⚡ Quick Start - Get started in 15 minutes
All migration scripts are located in the /scripts directory:
# 1. Setup environment
cd scripts
./setup_environment.sh
# 2. Run pre-migration checks
./pre_migration_checks.sh
# 3. Extract data from Azure Dedicated Pool
python3 extract_data.py \
--server mysynapse.sql.azuresynapse.net \
--database mydatabase \
--storage-account mystorageaccount \
--container migration-staging \
--parallel-jobs 6
# 4. Load data to Fabric Warehouse
python3 load_data.py \
--workspace myworkspace \
--warehouse mywarehouse \
--storage-account mystorageaccount \
--container migration-staging \
--parallel-jobs 8 \
--validate-rows
# 5. Validate migration
python3 validate_migration.py \
--source-server mysynapse.sql.azuresynapse.net \
--source-database mydatabase \
--target-workspace myworkspace \
--target-warehouse mywarehouse \
--generate-reportSee scripts/README.md for detailed documentation.
New! Interactive PySpark notebooks for running migration steps in Fabric:
All migration notebooks are located in the /notebooks directory:
- 01_extract_data.ipynb - Extract data from Azure Synapse to ADLS
- 02_load_data.ipynb - Load data from ADLS to Fabric Warehouse
- 03_validate_migration.ipynb - Validate migration completeness
- Helper Functions - Shared utilities for connections and operations
See notebooks/README.md for detailed documentation on running notebooks in Fabric.
New! Comprehensive documentation for the Python ETL library powering the migration scripts:
📚 ETL Library Documentation - Complete API reference covering:
- DataExtractor - Extract data from Synapse to ADLS Gen2 using CETAS
- DataLoader - Load data from ADLS Gen2 to Fabric Warehouse using COPY INTO
- MigrationValidator - Validate row counts and data integrity
- ConnectionHelper - Database connection utilities for PySpark notebooks
- MigrationUtils - Common migration operations and helpers
- StorageHelper - Azure Data Lake Storage operations
The documentation includes:
- Architecture and design patterns
- Complete API reference for all classes and methods
- Usage examples (CLI, Python scripts, PySpark notebooks)
- Best practices for performance, security, and monitoring
- Comprehensive troubleshooting guide
- [NEW] Comprehensive Migration Guide - Complete guide with scripts
- [NEW] PySpark Notebooks - Interactive notebooks for Fabric
- [NEW] Data Type Mapping Guide - Datatype compatibility reference
- [NEW] Permissions Guide - Security and access setup
- Official Microsoft documentation
- Existing PowerShell scripts and utils
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.