Data Engineering Course Outline (Guide)

PHASE 1 (Weeks 1–5): Onboarding and SQL Mastery

Week 1: Onboarding and Engineering Foundations

Onboarding

Program roadmap and expectations
Tools installation (VS Code, Git, PostgreSQL, Docker)
GitHub setup and repository management
Agile fundamentals and sprint workflows
Documentation standards
Portfolio strategy and industry positioning

Foundations

What is Data Engineering?
Data Engineer vs Data Analyst vs Data Scientist
Modern Data Stack overview
Data lifecycle and architecture fundamentals
Batch vs Streaming systems
OLTP vs OLAP systems
Introduction to relational databases

Practical

Install and configure local database
Create tables and insert records
Write basic SQL queries

Weeks 2–4: SQL for Data Engineering (Deep Dive)

Core SQL

SELECT, WHERE, GROUP BY, HAVING
Joins (inner, left, right, full)
Subqueries
Common Table Expressions (CTEs)
Window functions
Aggregations

Performance and Engineering

Indexing strategies
Query optimization
Understanding execution plans
Constraints and keys
Transactions
Introduction to stored procedures

Data Modeling

Normalization principles
Dimensional modeling
Star schema
Snowflake schema
Slowly Changing Dimensions (SCD Type 1 and 2)

Projects

Build and populate a retail transactional database
Write advanced analytical queries
Design and implement a mini data warehouse
Optimize poorly performing queries

PHASE 2 (Weeks 5–9): Python for Data Engineering

Weeks 5–6: Python Fundamentals for Engineers

Topics

Python fundamentals refresher
File handling and OS operations
Working with CSV, JSON, and Parquet files
Exception handling
Writing modular and reusable scripts
Logging best practices
Virtual environments
Packaging basics

Projects

Build a reusable data ingestion script
Convert raw CSV files into structured cleaned output

Weeks 7–8: Python and Databases

Topics

Connecting Python to SQL databases
CRUD operations programmatically
Building ETL scripts
Data validation and cleaning
Writing scalable transformation scripts

Project

Build an automated ETL pipeline (Python to SQL database)

Week 9: Engineering Best Practices

Topics

Code structure and project organization
Git workflows and branching strategies
Introduction to testing
Documentation standards
Introduction to Docker

Project

Containerize the ETL pipeline

PHASE 3 (Weeks 10–15): ETL, Orchestration and Big Data

Weeks 10–11: ETL and ELT Pipeline Engineering

Topics

ETL vs ELT architecture
Designing pipeline workflows
Data validation frameworks
Error handling and monitoring
Data quality checks

Project

Build a complete pipeline: Raw layer to Clean layer to Warehouse layer

Week 12: Workflow Orchestration

Topics

Introduction to Apache Airflow
DAG design
Scheduling and automation
Monitoring and logging
Retry logic and alerting

Project

Convert the ETL pipeline into an Airflow DAG
Implement failure handling and notifications

Weeks 13–14: Big Data Processing

Topics

Distributed systems fundamentals
Introduction to Apache Spark
Spark DataFrames
Spark SQL
Partitioning strategies
Performance tuning

Projects

Process a large log dataset using Spark
Compare Pandas vs Spark performance
Build a distributed transformation pipeline

Week 15: Mid-Program Industry Simulation

Team-based sprint simulation:

Design system architecture
Build ingestion pipelines
Transform and model data
Deliver analytics-ready datasets
Present solution to a technical review panel

PHASE 4 (Weeks 16–20): Streaming and Cloud Data Engineering

Weeks 16–17: Real-Time Data Engineering

Topics

Event-driven architecture
Message brokers and stream processing
Introduction to Apache Kafka
Producers and consumers
Real-time ETL patterns

Project

Simulate a fintech transaction stream
Process and store streaming data into a database

Weeks 18–20: Cloud Data Engineering (AWS Track Example)

Topics

Cloud computing fundamentals
Identity and Access Management (IAM)
Object storage systems
Data lakes and warehouse integration
Serverless architecture concepts

Tools

Amazon S3
Amazon Redshift
AWS Lambda

Projects

Deploy a full data pipeline to the cloud
Build a cloud-based data lake
Automate ingestion using serverless triggers
Load and optimize warehouse queries

PHASE 5 (Weeks 21–24): Production Engineering and Capstone

Week 21: Data Architecture and Governance

Topics

Medallion architecture (Bronze, Silver, Gold layers)
Data quality frameworks
Observability and monitoring
Security and compliance fundamentals
Cost optimization strategies

Week 22: DevOps for Data Engineers

Topics

CI/CD for data pipelines
Advanced Docker usage
Deployment strategies
Monitoring tools
Infrastructure fundamentals

Weeks 23–24: Capstone Project

Students select one industry scenario:

1. Fintech Data Platform

Batch and streaming ingestion
Fraud analytics dataset preparation
Warehouse modeling
Cloud deployment
Full technical documentation

2. E-commerce Analytics Platform

Clickstream ingestion
Real-time order processing
Data warehouse design
BI-ready data marts

3. Telecom Data Platform

Process call detail records (CDRs)
Large-scale Spark transformations
Usage analytics warehouse

Graduate Profile

By the end of the program, students will be able to:

Write advanced SQL queries
Design and implement data warehouses
Build automated ETL pipelines
Orchestrate workflows using Airflow
Process large-scale data using Spark
Implement streaming systems
Deploy data pipelines to cloud platforms
Apply DevOps and production best practices

Data Engineering Course Outline (Guide)

About

Quick Links

Our Courses

Enterprise Services

Get in touch