Skip to content

Complete data engineering roadmap with technologies, certifications (AWS, GCP, Azure, Snowflake, Databricks), and learning resources.

License

Notifications You must be signed in to change notification settings

thehimel/data-engineering-roadmap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

21 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿš€ Complete Data Engineering Roadmap

Data Engineering Roadmap Python License

A comprehensive guide to becoming a data engineer with learning paths, essential technologies, certifications, and resources


๐Ÿ“‹ Table of Contents


๐ŸŽฏ Overview

This repository provides a complete data engineering roadmap curated from industry experts, including a comprehensive learning path, essential technologies, professional certifications, and valuable resources. Whether you're starting your journey or looking to advance your career, this roadmap will guide you through every step of becoming a proficient data engineer.

โœจ What's Inside

  • ๐Ÿ“Š Learning Roadmaps - Step-by-step guides from industry experts
  • ๐Ÿ”ง Technology Stack - Essential tools and frameworks organized by category
  • ๐Ÿ† Certifications - Professional certifications from AWS, GCP, Azure, Snowflake, Databricks, and more
  • ๐Ÿ“– Resources - Curated learning materials and references
  • ๐ŸŽฏ Best Practices - Preferred technologies and recommended learning paths

๐Ÿ—บ๏ธ Learning Roadmaps

๐Ÿ“บ Video Roadmaps

Roadmap Source
๐ŸŽฅ The ONLY Data Engineer Certifications You Need to Find a Job Jash Radia (Google Data Engineer)
๐ŸŽฅ God Tier Data Engineering Roadmap Jash Radia (Google Data Engineer)
๐Ÿ“„ Detailed Roadmap PDF Jash Radia
๐ŸŽฅ How I would learn Data Engineering Jayzern

๐Ÿ› ๏ธ Technologies & Tools

A curated list of essential technologies organized by category. Bold items indicate preferred/recommended options.

Category Technology/Tool Status
๐Ÿ’ป Programming Languages Python, SQL, Java, Scala โญ Essential
โšก Processing Frameworks Spark (PySpark), Flink, Apache Beam, AWS EMR ๐Ÿ”ฅ Core
๐Ÿ—„๏ธ Databases PostgreSQL, MySQL, MongoDB, Cassandra, DynamoDB, BigTable ๐Ÿ“Š Important
โ˜๏ธ Data Warehouses Snowflake, BigQuery, Redshift, Databricks ๐ŸŽฏ Critical
๐Ÿง Operating Systems Linux โœ… Required
โ˜๏ธ Cloud Service Providers AWS, GCP, Azure ๐ŸŒŸ Essential
๐Ÿ”„ Orchestration Airflow, Prefect, Dagster ๐Ÿš€ Must Learn
๐Ÿ“ก Streaming Kafka, Kinesis, Pub/Sub, Flink โšก Important
๐Ÿณ Containerization Docker, Kubernetes ๐ŸŽฏ Industry Standard
๐Ÿ—๏ธ Infrastructure as Code Terraform, CloudFormation, Pulumi ๐Ÿ”ง Recommended
๐Ÿ”„ CI/CD GitHub Actions, Jenkins, GitLab CI, SonarQube, SonarCloud โœ… Best Practice
๐Ÿ“ Version Control Git โœ… Essential
๐Ÿ’พ Data Formats Parquet, Avro, ORC, JSON ๐Ÿ“ฆ Important
๐Ÿ“Š Monitoring & Observability Datadog, Prometheus, Grafana, CloudWatch ๐Ÿ‘๏ธ Critical
โœ… Data Quality & Testing Great Expectations, dbt ๐ŸŽฏ Best Practice
๐Ÿ“จ Message Queues RabbitMQ, Amazon SQS ๐Ÿ”„ Useful
๐Ÿ”€ ETL/Data Integration AWS Glue ๐Ÿ› ๏ธ Important
๐Ÿ“ˆ Data Visualization/BI Power BI ๐Ÿ“Š Valuable

๐Ÿ’ก Note: Bold technologies indicate preferred options based on industry standards and market demand.

๐ŸŽฏ Technology Learning Priority

  1. ๐Ÿ”ฅ Must Learn First: Python, SQL, Linux, Spark, Airflow, Docker
  2. โญ High Priority: Cloud Platform (AWS/GCP/Azure), Data Warehouse (Snowflake/BigQuery), Kafka
  3. ๐Ÿ“š Expand Knowledge: Kubernetes, Terraform, dbt, Monitoring tools

๐ŸŽ“ Certifications

Professional certifications to validate your skills and advance your career. Organized by provider:

โ˜๏ธ AWS (Amazon Web Services)

๐Ÿ† AWS Certified Solutions Architect - Associate (Foundation)

**Certification: ** AWS Certified Solutions Architect - Associate

Practice Exams:

๐Ÿ“ AWS Certified Solutions Architect Associate Practice Exams - Instructor: Jon Bonso

๐Ÿ† AWS Certified Data Engineer - Associate (Core)

Certification: AWS Certified Data Engineer - Associate

Exam Prep:

๐Ÿ“š Exam Prep Plan: AWS Certified Data Engineer - Associate (DEA-C01)

Courses:

๐Ÿ”— GitHub - Hands-on Practice - Hands-on labs and practice exercises

Articles & Guides:

๐Ÿ“– How I Prepared for the AWS Data Engineer Associate Exam - Denis Burakov | First-hand experience and preparation tips

๐Ÿ† AWS Certified Solutions Architect - Professional (Optional)

Certification: AWS Certified Solutions Architect - Professional

๐Ÿ“š Common AWS Data Engineering Resources

Resource Type Link
๐Ÿ“š Data Engineering on AWS Learning Plans Learning Plan (with labs) View Details
๐Ÿ“š Data Engineering on AWS - Foundations Course View Details
๐Ÿ“š A Day in the Life of a Data Engineer Course View Details

๐ŸŒŸ Apache Airflow

๐Ÿ† Apache Airflow 3 Fundamentals (Mandatory)

Certification: Apache Airflow 3 Fundamentals

Courses:

Practice Exams:

๐Ÿ† DAG Authoring (Airflow 3) (Optional)

Certification: DAG Authoring (Airflow 3)

Practice Exams:

โšก Apache Spark

๐Ÿ† Databricks Certified Associate Developer for Apache Spark (Mandatory)

Certification: Databricks Certified Associate Developer for Apache Spark

Courses:

Practice Exams:

๐Ÿ”ท Databricks

๐Ÿ† Data Engineer Associate (Optional)

Certification: Data Engineer Associate

Courses:

Practice Exams:

๐Ÿ† Data Engineer Professional (Optional)

Certification: Data Engineer Professional

Courses:

Practice Exams:

๐Ÿ”ต Google Cloud Platform

๐Ÿ† Associate Cloud Engineer (Foundation)

Certification: Associate Cloud Engineer

๐Ÿ† Professional Data Engineer (Core)

Certification: Professional Data Engineer

Courses:

๐Ÿ”ท Microsoft Azure

๐Ÿ† Fabric Data Engineer Associate (Core)

Certification: Fabric Data Engineer Associate

๐Ÿ† Azure Solutions Architect Expert (Foundation) (Optional)

Certification: Azure Solutions Architect Expert

โ„๏ธ Snowflake (Optional)

๐Ÿ† SnowProยฎ Core

Certification: SnowProยฎ Core

๐Ÿ† SnowProยฎ Advanced: Data Engineer

Certification: SnowProยฎ Advanced: Data Engineer

๐Ÿ“š Common Snowflake Resources

๐Ÿ’ก Notes

๐ŸŽฏ Certification Strategy

Recommended Learning Path:

  1. Foundation โ†’ Start with cloud platform certifications (AWS/GCP/Azure)
  2. Specialization โ†’ Then pursue data platform certifications (Snowflake/Databricks) based on your career goals
  3. Advanced โ†’ Consider professional-level certifications after gaining experience

๐Ÿ“Œ Key Points

  • Cloud First: Master one cloud platform (AWS, GCP, or Azure) before moving to specialized platforms
  • Hands-on Practice: Combine certifications with real-world projects and hands-on labs
  • Career Alignment: Choose certifications that align with your target roles and industry requirements
  • Continuous Learning: Stay updated with new certifications and platform updates

๐Ÿ“š Resources

๐ŸŒ Essential Websites

๐Ÿ“– Learning Path

  • Master Python and SQL
  • Learn one cloud service provider very good i.e. AWS.
  • Get Certifications
    • Cloud
    • Airflow
    • Spark from Databricks

๐Ÿ“– References

๐Ÿ“ฐ Articles & Guides

๐Ÿ’ผ Interview Experiences

Real-world interview experiences from data engineers who successfully landed roles at top tech companies. Learn from their journeys, preparation strategies, and insights.

๐ŸŽฅ Video Interviews

Interview Experience Creator Note
๐ŸŽค Google Data Engineer Interview Experience Data Depth Leetcode Medium, Hackerrank Advanced SQL
๐ŸŽค Google Data Engineer Interview Experience Jash Radia

๐Ÿ’ก Tip: Watch these interviews to understand the interview format, types of questions asked, and how to prepare effectively for data engineering roles at top companies.

Job Boards


โญ Show Your Support

If this roadmap helped you in your data engineering journey, please consider giving it a โญ star!


๐Ÿ“„ License

This project is open source and available under the MIT License.


Made with โค๏ธ for aspiring data engineers

โฌ† Back to Top

About

Complete data engineering roadmap with technologies, certifications (AWS, GCP, Azure, Snowflake, Databricks), and learning resources.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published