Day 24/30 SQL, Python, ETL, Data Modelling Challenge FREE Solutions π
March 27th, 2025 CHALLENGE β unlock solutions + reasoning
π Hey Data Engineers!
Difficulty Level: Intermediate β Advanced
Today weβre diving into indexing, conditional list comprehension, Airflow orchestration, and tracking historical changes in dimensional models. These topics are must-know for production-grade systems.
Understand, Donβt Memorize:
β
Clear, practical explanations
β
Production-ready tips
β
Interview-first mindset
Want runnable code + exclusive deep dives? Upgrade to the Annual Plan and fast-track your DE skills.
π SQL Challenge β Indexing for Query Optimization
β Which query would benefit MOST from adding an index on the email column?
π A) SELECT * FROM users WHERE email = 'x@example.com'
π B) SELECT COUNT(*) FROM users
π C) SELECT * FROM users ORDER BY id DESC
π D) SELECT DISTINCT department FROM users
β Answer: A - WHERE email = 'x@example.com'
Explanation:
Indexing helps when filtering or joining on specific column values. An index on email speeds up lookups drastically.
Best Practices:
βοΈ Add indexes to high-cardinality, frequently filtered columns
βοΈ Avoid over-indexing (it slows writes)
βοΈ Use EXPLAIN to verify performance gains
π Python Challenge β List Comprehension with Conditionals
nums = [1, 2, 3, 4, 5, 6]
result = [x * 2 for x in nums if x % 2 == 0]
print(result)β What will be the output?
π A) [2, 4, 6, 8, 10, 12]
π B) [4, 8, 12]
π C) [2, 4, 6]
π D) [4, 8]
β Answer: B - [4, 8, 12]
Explanation:
It filters even numbers (x % 2 == 0) β [2, 4, 6], then doubles them: [4, 8, 12].
Best Practices:
βοΈ Use list comprehensions for clean, readable filtering + transformation
βοΈ Avoid unnecessary temp lists
βοΈ Replace for-loops where possible
β‘ ETL Challenge β Orchestrating Pipelines
β Which of these tools is most commonly used for task orchestration in ETL?
π A) dbt
π B) Apache Airflow
π C) Kafka
π D) PostgreSQL
β Answer: B - Apache Airflow
Explanation:
Airflow is the industry standard for defining and managing ETL workflows via DAGs (Directed Acyclic Graphs).
Best Practices:
βοΈ Use task dependencies to control execution order
βοΈ Monitor and retry failed tasks via Airflow UI
βοΈ Combine with sensors for event-driven pipelines
π§± Data Modeling Challenge β Slowly Changing Dimensions
β Which SCD type preserves full history of changes to a dimension?
π A) Type 0
π B) Type 1
π C) Type 2
π D) Type 3
β Answer: C - Type 2
Explanation:
SCD Type 2 stores a new row for each historical change, ensuring you can always see what the dimension looked like at any point in time.
Best Practices:
βοΈ Add effective_date, end_date, and is_current flags
βοΈ Use surrogate keys to join facts to correct version
βοΈ Automate inserts via merge logic
π Final Stretch β Letβs Finish Strong!
Youβre just a few days from completing this 30-day transformation.
β
Join for hands-on SQL/Python projects
β
Get exclusive mock interview questions
β
Deep-dive with working code examples
π Subscribe here: zero2dataengineer.substack.com
π¬ Drop your answers in the comments β best responses = shoutouts tomorrow


