PHASE 1 (Weeks 1–5): Onboarding and SQL Mastery
Week 1: Onboarding and Engineering Foundations
Onboarding
- Program roadmap and expectations
- Tools installation (VS Code, Git, PostgreSQL, Docker)
- GitHub setup and repository management
- Agile fundamentals and sprint workflows
- Documentation standards
- Portfolio strategy and industry positioning
Foundations
- What is Data Engineering?
- Data Engineer vs Data Analyst vs Data Scientist
- Modern Data Stack overview
- Data lifecycle and architecture fundamentals
- Batch vs Streaming systems
- OLTP vs OLAP systems
- Introduction to relational databases
Practical
- Install and configure local database
- Create tables and insert records
- Write basic SQL queries
Weeks 2–4: SQL for Data Engineering (Deep Dive)
Core SQL
- SELECT, WHERE, GROUP BY, HAVING
- Joins (inner, left, right, full)
- Subqueries
- Common Table Expressions (CTEs)
- Window functions
- Aggregations
Performance and Engineering
- Indexing strategies
- Query optimization
- Understanding execution plans
- Constraints and keys
- Transactions
- Introduction to stored procedures
Data Modeling
- Normalization principles
- Dimensional modeling
- Star schema
- Snowflake schema
- Slowly Changing Dimensions (SCD Type 1 and 2)
Projects
- Build and populate a retail transactional database
- Write advanced analytical queries
- Design and implement a mini data warehouse
- Optimize poorly performing queries
PHASE 2 (Weeks 5–9): Python for Data Engineering
Weeks 5–6: Python Fundamentals for Engineers
Topics
- Python fundamentals refresher
- File handling and OS operations
- Working with CSV, JSON, and Parquet files
- Exception handling
- Writing modular and reusable scripts
- Logging best practices
- Virtual environments
- Packaging basics
Projects
- Build a reusable data ingestion script
- Convert raw CSV files into structured cleaned output
Weeks 7–8: Python and Databases
Topics
- Connecting Python to SQL databases
- CRUD operations programmatically
- Building ETL scripts
- Data validation and cleaning
- Writing scalable transformation scripts
Project
- Build an automated ETL pipeline (Python to SQL database)
Week 9: Engineering Best Practices
Topics
- Code structure and project organization
- Git workflows and branching strategies
- Introduction to testing
- Documentation standards
- Introduction to Docker
Project
- Containerize the ETL pipeline
PHASE 3 (Weeks 10–15): ETL, Orchestration and Big Data
Weeks 10–11: ETL and ELT Pipeline Engineering
Topics
- ETL vs ELT architecture
- Designing pipeline workflows
- Data validation frameworks
- Error handling and monitoring
- Data quality checks
Project
- Build a complete pipeline: Raw layer to Clean layer to Warehouse layer
Week 12: Workflow Orchestration
Topics
- Introduction to Apache Airflow
- DAG design
- Scheduling and automation
- Monitoring and logging
- Retry logic and alerting
Project
- Convert the ETL pipeline into an Airflow DAG
- Implement failure handling and notifications
Weeks 13–14: Big Data Processing
Topics
- Distributed systems fundamentals
- Introduction to Apache Spark
- Spark DataFrames
- Spark SQL
- Partitioning strategies
- Performance tuning
Projects
- Process a large log dataset using Spark
- Compare Pandas vs Spark performance
- Build a distributed transformation pipeline
Week 15: Mid-Program Industry Simulation
Team-based sprint simulation:
- Design system architecture
- Build ingestion pipelines
- Transform and model data
- Deliver analytics-ready datasets
- Present solution to a technical review panel
PHASE 4 (Weeks 16–20): Streaming and Cloud Data Engineering
Weeks 16–17: Real-Time Data Engineering
Topics
- Event-driven architecture
- Message brokers and stream processing
- Introduction to Apache Kafka
- Producers and consumers
- Real-time ETL patterns
Project
- Simulate a fintech transaction stream
- Process and store streaming data into a database
Weeks 18–20: Cloud Data Engineering (AWS Track Example)
Topics
- Cloud computing fundamentals
- Identity and Access Management (IAM)
- Object storage systems
- Data lakes and warehouse integration
- Serverless architecture concepts
Tools
- Amazon S3
- Amazon Redshift
- AWS Lambda
Projects
- Deploy a full data pipeline to the cloud
- Build a cloud-based data lake
- Automate ingestion using serverless triggers
- Load and optimize warehouse queries
PHASE 5 (Weeks 21–24): Production Engineering and Capstone
Week 21: Data Architecture and Governance
Topics
- Medallion architecture (Bronze, Silver, Gold layers)
- Data quality frameworks
- Observability and monitoring
- Security and compliance fundamentals
- Cost optimization strategies
Week 22: DevOps for Data Engineers
Topics
- CI/CD for data pipelines
- Advanced Docker usage
- Deployment strategies
- Monitoring tools
- Infrastructure fundamentals
Weeks 23–24: Capstone Project
Students select one industry scenario:
1. Fintech Data Platform
- Batch and streaming ingestion
- Fraud analytics dataset preparation
- Warehouse modeling
- Cloud deployment
- Full technical documentation
2. E-commerce Analytics Platform
- Clickstream ingestion
- Real-time order processing
- Data warehouse design
- BI-ready data marts
3. Telecom Data Platform
- Process call detail records (CDRs)
- Large-scale Spark transformations
- Usage analytics warehouse
Graduate Profile
By the end of the program, students will be able to:
- Write advanced SQL queries
- Design and implement data warehouses
- Build automated ETL pipelines
- Orchestrate workflows using Airflow
- Process large-scale data using Spark
- Implement streaming systems
- Deploy data pipelines to cloud platforms
- Apply DevOps and production best practices