Data breed Africa

Data Engineering Course Outline (Guide)

PHASE 1 (Weeks 1–5): Onboarding and SQL Mastery

Week 1: Onboarding and Engineering Foundations

Onboarding

  • Program roadmap and expectations
  • Tools installation (VS Code, Git, PostgreSQL, Docker)
  • GitHub setup and repository management
  • Agile fundamentals and sprint workflows
  • Documentation standards
  • Portfolio strategy and industry positioning

Foundations

  • What is Data Engineering?
  • Data Engineer vs Data Analyst vs Data Scientist
  • Modern Data Stack overview
  • Data lifecycle and architecture fundamentals
  • Batch vs Streaming systems
  • OLTP vs OLAP systems
  • Introduction to relational databases

Practical

  • Install and configure local database
  • Create tables and insert records
  • Write basic SQL queries

Weeks 2–4: SQL for Data Engineering (Deep Dive)

Core SQL

  • SELECT, WHERE, GROUP BY, HAVING
  • Joins (inner, left, right, full)
  • Subqueries
  • Common Table Expressions (CTEs)
  • Window functions
  • Aggregations

Performance and Engineering

  • Indexing strategies
  • Query optimization
  • Understanding execution plans
  • Constraints and keys
  • Transactions
  • Introduction to stored procedures

Data Modeling

  • Normalization principles
  • Dimensional modeling
  • Star schema
  • Snowflake schema
  • Slowly Changing Dimensions (SCD Type 1 and 2)

Projects

  • Build and populate a retail transactional database
  • Write advanced analytical queries
  • Design and implement a mini data warehouse
  • Optimize poorly performing queries

PHASE 2 (Weeks 5–9): Python for Data Engineering

Weeks 5–6: Python Fundamentals for Engineers

Topics

  • Python fundamentals refresher
  • File handling and OS operations
  • Working with CSV, JSON, and Parquet files
  • Exception handling
  • Writing modular and reusable scripts
  • Logging best practices
  • Virtual environments
  • Packaging basics

Projects

  • Build a reusable data ingestion script
  • Convert raw CSV files into structured cleaned output

Weeks 7–8: Python and Databases

Topics

  • Connecting Python to SQL databases
  • CRUD operations programmatically
  • Building ETL scripts
  • Data validation and cleaning
  • Writing scalable transformation scripts

Project

  • Build an automated ETL pipeline (Python to SQL database)

Week 9: Engineering Best Practices

Topics

  • Code structure and project organization
  • Git workflows and branching strategies
  • Introduction to testing
  • Documentation standards
  • Introduction to Docker

Project

  • Containerize the ETL pipeline

PHASE 3 (Weeks 10–15): ETL, Orchestration and Big Data

Weeks 10–11: ETL and ELT Pipeline Engineering

Topics

  • ETL vs ELT architecture
  • Designing pipeline workflows
  • Data validation frameworks
  • Error handling and monitoring
  • Data quality checks

Project

  • Build a complete pipeline: Raw layer to Clean layer to Warehouse layer

Week 12: Workflow Orchestration

Topics

  • Introduction to Apache Airflow
  • DAG design
  • Scheduling and automation
  • Monitoring and logging
  • Retry logic and alerting

Project

  • Convert the ETL pipeline into an Airflow DAG
  • Implement failure handling and notifications

Weeks 13–14: Big Data Processing

Topics

  • Distributed systems fundamentals
  • Introduction to Apache Spark
  • Spark DataFrames
  • Spark SQL
  • Partitioning strategies
  • Performance tuning

Projects

  • Process a large log dataset using Spark
  • Compare Pandas vs Spark performance
  • Build a distributed transformation pipeline

Week 15: Mid-Program Industry Simulation

Team-based sprint simulation:

  • Design system architecture
  • Build ingestion pipelines
  • Transform and model data
  • Deliver analytics-ready datasets
  • Present solution to a technical review panel

PHASE 4 (Weeks 16–20): Streaming and Cloud Data Engineering

Weeks 16–17: Real-Time Data Engineering

Topics

  • Event-driven architecture
  • Message brokers and stream processing
  • Introduction to Apache Kafka
  • Producers and consumers
  • Real-time ETL patterns

Project

  • Simulate a fintech transaction stream
  • Process and store streaming data into a database

Weeks 18–20: Cloud Data Engineering (AWS Track Example)

Topics

  • Cloud computing fundamentals
  • Identity and Access Management (IAM)
  • Object storage systems
  • Data lakes and warehouse integration
  • Serverless architecture concepts

Tools

  • Amazon S3
  • Amazon Redshift
  • AWS Lambda

Projects

  • Deploy a full data pipeline to the cloud
  • Build a cloud-based data lake
  • Automate ingestion using serverless triggers
  • Load and optimize warehouse queries

PHASE 5 (Weeks 21–24): Production Engineering and Capstone

Week 21: Data Architecture and Governance

Topics

  • Medallion architecture (Bronze, Silver, Gold layers)
  • Data quality frameworks
  • Observability and monitoring
  • Security and compliance fundamentals
  • Cost optimization strategies

Week 22: DevOps for Data Engineers

Topics

  • CI/CD for data pipelines
  • Advanced Docker usage
  • Deployment strategies
  • Monitoring tools
  • Infrastructure fundamentals

Weeks 23–24: Capstone Project

Students select one industry scenario:

1. Fintech Data Platform

  • Batch and streaming ingestion
  • Fraud analytics dataset preparation
  • Warehouse modeling
  • Cloud deployment
  • Full technical documentation

2. E-commerce Analytics Platform

  • Clickstream ingestion
  • Real-time order processing
  • Data warehouse design
  • BI-ready data marts

3. Telecom Data Platform

  • Process call detail records (CDRs)
  • Large-scale Spark transformations
  • Usage analytics warehouse

Graduate Profile

By the end of the program, students will be able to:

  • Write advanced SQL queries
  • Design and implement data warehouses
  • Build automated ETL pipelines
  • Orchestrate workflows using Airflow
  • Process large-scale data using Spark
  • Implement streaming systems
  • Deploy data pipelines to cloud platforms
  • Apply DevOps and production best practices
Scroll to Top