Day 26/30 SQL, Python, ETL, Data Modelling Challenge FREE Solutions

March 31st CHALLENGE – unlock solutions + reasoning

Avantika_Penumarty

Apr 01, 2025

👋 Hey Data Engineers!

Welcome to Day 26 of the 30-Day Data Engineering Challenge.
Today’s topics will level up your confidence with:

SQL Recursive CTEs
Python Decorators
Incremental ETL Logic
Slowly Changing Dimensions in Data Modeling

SQL Challenge – Recursive CTEs for Hierarchical Data

Challenge Recap:
❓ What does this SQL recursive query return?

WITH RECURSIVE employee_path AS (
  SELECT id, name, manager_id FROM employees WHERE id = 1
  UNION ALL
  SELECT e.id, e.name, e.manager_id
  FROM employees e
  JOIN employee_path ep ON e.manager_id = ep.id
)
SELECT COUNT(*) FROM employee_path;

✅ Answer: C - Number of employees under employee 1

Explanation:
This recursive CTE starts from a single employee (ID = 1) and navigates all direct and indirect reports. The final count reflects how many employees fall under that node in the hierarchy.

Best Practices:

Use recursive CTEs for org charts and category hierarchies
Always include a base case (starting point) and an exit condition
Add a level column to control recursion depth if needed

Python Challenge – Understanding Decorators

Challenge Recap:

def decorator(func):
    def wrapper():
        print("Before")
        func()
        print("After")
    return wrapper

@decorator
def greet():
    print("Hello")

greet()

✅ Answer: B - Before Hello After

Explanation:
The @decorator syntax wraps greet() with additional logic—printing "Before" and "After" around the original function. It's a clean way to add behavior without modifying the function itself.

Best Practices:

Use decorators for logging, timing, retries, and security
Wrap the original function using functools.wraps() to preserve metadata
Avoid overusing decorators when a regular function suffices

ETL Challenge – When to Use Incremental Loads

Challenge Recap:
❓ When is incremental loading preferred in an ETL pipeline?

✅ Answer: C - When only new/updated records need to be processed

Explanation:
Incremental loads reduce resource usage by pulling only changes since the last job run. Ideal for growing datasets, especially when most records remain unchanged.

Best Practices:

Use last_updated timestamps or change flags
Store job metadata to track state between runs
Design idempotent loads to avoid duplicates during retries

Data Modeling Challenge – SCD Type 2

Challenge Recap:
❓ Which SCD Type stores historical versions of changing records?

✅ Answer: C - Type 2

Explanation:
SCD Type 2 tracks changes by inserting new rows with updated attributes and valid date ranges. Useful when you need a full timeline of how dimensions changed.

Best Practices:

Use effective_from / effective_to columns
Keep current rows open (effective_to = NULL)
Add a surrogate key for uniqueness across versions

📌 Want full access to Deep Dive breakdowns, OneCompiler test cases, and real-world interview prep?

Upgrade to Annual or Elite Plan at
zero2dataengineer.substack.com and unlock exclusive walkthroughs + runnable code + advanced best practices!

💬 Drop your answers in the comments.
🎯 Top scorers get a LinkedIn shoutout and recruiter visibility next week!

Discussion about this post

Ready for more?