Day 27/30 - DEEP DIVE SOLUTIONS: SQL, PYTHON, ETL, DATA MODELING

Solutions for April 1st, 2025 Challenge – Full Breakdown + Live Runnable Code

Avantikka_Penumarty

Apr 02, 2025

∙ Paid

Hello Data Engineers,

You’re just a few steps away from completing the 30-Day Challenge. Today’s deep dive covers:

SQL NULL handling with COALESCE
Python list comprehensions
Airflow’s depends_on_past logic
Data normalization and 3NF

Let’s dive into today’s practical breakdowns + testing steps.

SQL Deep Dive: COALESCE and NULL Aggregates

Challenge Recap:
❓ What will this query return?

SELECT COALESCE(SUM(salary), 0)
FROM employees
WHERE department = 'Marketing';

🔘 A) NULL if there are no Marketing employees
🔘 B) 0 if there are no Marketing employees
🔘 C) Error
🔘 D) The string "No Marketing Employees"

✅ Answer: Option B - 0 if there are no Marketing employees

Why This Happens:

When SUM() has no matching rows, it returns NULL.
Wrapping it with COALESCE() ensures a default value (in this case, 0) is returned instead—ideal for reports or dashboards where NULLs are undesirable.

Where It’s Used in Real-World Applications:

Defaulting missing totals in finance dashboards
Handling sparse data with fallback values
Avoiding broken charts or empty aggregates

Run & Test on OneCompiler:

1️⃣ Open OneCompiler → Select PostgreSQL
2️⃣ Paste this code:

CREATE TABLE employees (
    id SERIAL,
    name TEXT,
    salary INT,
    department TEXT
);

INSERT INTO employees (name, salary, department) VALUES
('Alice', 80000, 'Engineering'),
('Bob', 90000, 'Engineering'),
('Charlie', 85000, 'HR');

-- Query for non-existent department
SELECT COALESCE(SUM(salary), 0)
FROM employees
WHERE department = 'Marketing';

3️⃣ Click Run and confirm that result is 0.

Best Practices:

✔️ Always wrap SUM(), AVG(), or MAX() with COALESCE() in filtered aggregates
✔️ Use 0 for numeric defaults, '' for text defaults
✔️ Ensure dashboards and data consumers don’t misinterpret NULLs as errors

Python Deep Dive: Filtering with List Comprehensions

Challenge Recap:

nums = [1, 2, 3, 4, 5]
evens = [x for x in nums if x % 2 == 0]
print(evens)

🔘 A) [1, 3, 5]
🔘 B) [2, 4]
🔘 C) [1, 2, 3, 4, 5]
🔘 D) Error

✅ Answer: Option B - [2, 4]

Why This Happens:

This list comprehension filters the list and includes only even numbers.
x % 2 == 0 evaluates to True only for 2 and 4.

Where It’s Used in Real-World Applications:

Filtering numeric lists for condition matches
Cleaning JSON or CSV fields in ETL
Writing one-liners for quick data processing

Run & Test on OneCompiler:

1️⃣ Open OneCompiler → Select Python 3
2️⃣ Paste this code:

nums = [1, 2, 3, 4, 5]
evens = [x for x in nums if x % 2 == 0]
print(evens)

3️⃣ Click Run to confirm output is [2, 4]

Best Practices:

✔️ Use comprehensions for clean, readable filtering
✔️ Avoid embedding complex logic—use functions for readability
✔️ Prefer list comprehensions over filter() unless lazy evaluation is needed

ETL Deep Dive: Airflow `depends_on_past=True`

Challenge Recap:
❓ What does setting depends_on_past=True in a task do?

🔘 A) Ensures task waits for all parallel tasks
🔘 B) Ensures task runs after the next scheduled DAG
🔘 C) Forces re-run of all past failed DAGs
🔘 D) Prevents task from running if it failed in the previous DAG run

✅ Answer: Option D - Prevents task from running if it failed in the previous DAG run

Why This Happens:

In Airflow, depends_on_past=True ensures that a task will only run if the same task succeeded in the previous run. This is critical for incremental ETL jobs that depend on successful state tracking.

Zero2Dataengineer

Day 27/30 - DEEP DIVE SOLUTIONS: SQL, PYTHON, ETL, DATA MODELING

Solutions for April 1st, 2025 Challenge – Full Breakdown + Live Runnable Code

SQL Deep Dive: COALESCE and NULL Aggregates

Why This Happens:

Where It’s Used in Real-World Applications:

Run & Test on OneCompiler:

Best Practices:

Python Deep Dive: Filtering with List Comprehensions

Why This Happens:

Where It’s Used in Real-World Applications:

Run & Test on OneCompiler:

Best Practices:

ETL Deep Dive: Airflow depends_on_past=True

Why This Happens:

Where It’s Used in Real-World Applications:

This post is for paid subscribers

ETL Deep Dive: Airflow `depends_on_past=True`