📢 Day 3/30 - SQL, PYTHON, ETL, DATA MODELLING CHALLENGE SOLUTIONS🔥

Solutions for Feb 26th, 2025 CHALLENGE – Unlock Key Concepts + Reasoning

Feb 27, 2025

👋 Hey Data Engineers!
Welcome to Day 3 of the 30-Day Data Engineering Challenge 🚀. Today, we’re diving into SQL Window Functions, Python Dictionary Mutability, ETL Deduplication, and Data Modeling.💡 What stood out to you today? Drop your thoughts in the comments!👇
✅ FREE Solutions + Reasoning → Now Available!
🔗 Subscribe & access here:

🚀 SQL Challenge - Indexing & Query Performance

🔹 Problem Statement

Which SQL query benefits the most from an index on the email column in the users table?

✅ Solution Explanation

Indexes speed up searches by allowing the database to quickly locate rows instead of scanning the entire table.

WHERE conditions on indexed columns (email = 'test@email.com') execute the fastest.
ORDER BY email can benefit if the index is sorted in the same order.
Aggregate functions like COUNT(*) don't benefit much from indexing unless there is a filter.

🔹 Correct Answer:

✔ A) SELECT * FROM users WHERE email = 'test@email.com'

🔹 Why This Works?

✔ Indexes optimize direct lookups (= conditions).
✔ Full table scans are avoided, improving query performance.
✔ Useful in large datasets where filtering is frequent.

🐍 Python Challenge - Lambda Functions & Map

🔹 Problem Statement

What does this Python statement return?

✅ Solution Explanation

map() applies a function to each element of a list.
Lambda functions provide an anonymous, inline function.
If lambda x: x*2 is used, it squares each element.

🔹 Correct Answer:

✔ A) [2, 4, 6, 8]

🔹 Why This Works?

✔ Applies x * 2 to each number in [1, 2, 3, 4].
✔ More concise than a for loop.
✔ Common in functional programming and data transformations.

⚡ ETL Challenge - Airflow DAG Dependencies

🔹 Problem Statement

How do you ensure a task runs only after all upstream tasks are complete in Airflow?

✅ Solution Explanation

Airflow manages dependencies between tasks using explicit task dependencies.
>> (bitwise operator) defines dependencies, ensuring Task A runs before Task B.
Manual triggers (sleep(), etc.) don’t enforce dependencies safely.

🔹 Correct Answer:

✔ A) Set task dependencies explicitly

🔹 Why This Works?

✔ Prevents race conditions & ensures correct DAG execution.
✔ Used in real-world ETL pipelines to maintain order.
✔ Scales better than manually triggering tasks.

📌 Data Modeling - Slowly Changing Dimensions (SCDs)

🔹 Problem Statement

Which technique is best for tracking historical changes in a dimensional table?

✅ Solution Explanation

SCD Type 1: Overwrites old data (no history tracking).
SCD Type 2: Adds a new row for each change (best for tracking full history).
SCD Type 3: Stores old + new values in the same row (limited history tracking).

🔹 Correct Answer:

B) Type 2: Adding a new row with a timestamp

🔹 Why This Works?

✔ Type 2 is best for maintaining full history.
✔ Type 1 is simple but doesn’t retain history.
✔ Type 3 is a hybrid approach for recent changes.

🔥 Key Takeaways from Day 3

✔ SQL Indexes improve query performance for lookups.
✔ Lambda functions & map() provide concise data transformations.
✔ Airflow dependencies ensure correct ETL task execution.
✔ Slowly Changing Dimensions track historical changes in tables.

🚀 Upgrade for Full Access – Learn in Detail, Never Forget!

If you’re enjoying these challenges, it’s time to go beyond just knowing the answers.
To truly master SQL, Python, ETL, and Data Modeling, you need deep conceptual clarity, hands-on practice, and real-world applications—not just solutions.
🚀 Want More? Unlock full breakdowns, runnable code, and real-world strategies!
UPGRADE
Now that you know the answers, let’s break them down with step-by-step deep dives, real-world applications, runnable code, and performance optimizations.
🚀 Follow these instructions to test & run live code:
1️⃣ Open onecompiler.com
2️⃣ Select the relevant language (PL/SQL/PostgreSQL or Python)
3️⃣ Copy and paste the code
4️⃣ Click Run and analyze the results

Rohish Zade

Feb 28, 2025

Answer for python and data modeling is incorrect.

In python, lambda x: x*2 return

[2, 4, 6, 8] not [1, 4, 9, 16]

In Data modeling section, the correct answer is Type 2: Adding a new row with a timestamp not all of the above as type 1 does not maintain the historical data

3 replies by Avantika_Penumarty and others

Adi H

Both of the post has direct solution and explanation. It does not have 4 options for a quiz, not sure if it's way you want to put

1 reply by Avantika_Penumarty

4 more comments...

Discussion about this post

Ready for more?