Day 27/30 - DEEP DIVE SOLUTIONS: SQL, PYTHON, ETL, DATA MODELING
Solutions for April 1st, 2025 Challenge – Full Breakdown + Live Runnable Code
Hello Data Engineers,
You’re just a few steps away from completing the 30-Day Challenge. Today’s deep dive covers:
SQL NULL handling with
COALESCE
Python list comprehensions
Airflow’s
depends_on_past
logicData normalization and 3NF
Let’s dive into today’s practical breakdowns + testing steps.
SQL Deep Dive: COALESCE and NULL Aggregates
Challenge Recap:
❓ What will this query return?
SELECT COALESCE(SUM(salary), 0)
FROM employees
WHERE department = 'Marketing';
🔘 A) NULL if there are no Marketing employees
🔘 B) 0 if there are no Marketing employees
🔘 C) Error
🔘 D) The string "No Marketing Employees"
✅ Answer: Option B - 0 if there are no Marketing employees
Why This Happens:
When SUM()
has no matching rows, it returns NULL
.
Wrapping it with COALESCE()
ensures a default value (in this case, 0
) is returned instead—ideal for reports or dashboards where NULLs are undesirable.
Where It’s Used in Real-World Applications:
Defaulting missing totals in finance dashboards
Handling sparse data with fallback values
Avoiding broken charts or empty aggregates
Run & Test on OneCompiler:
1️⃣ Open OneCompiler → Select PostgreSQL
2️⃣ Paste this code:
CREATE TABLE employees (
id SERIAL,
name TEXT,
salary INT,
department TEXT
);
INSERT INTO employees (name, salary, department) VALUES
('Alice', 80000, 'Engineering'),
('Bob', 90000, 'Engineering'),
('Charlie', 85000, 'HR');
-- Query for non-existent department
SELECT COALESCE(SUM(salary), 0)
FROM employees
WHERE department = 'Marketing';
3️⃣ Click Run and confirm that result is 0
.
Best Practices:
✔️ Always wrap SUM()
, AVG()
, or MAX()
with COALESCE()
in filtered aggregates
✔️ Use 0
for numeric defaults, ''
for text defaults
✔️ Ensure dashboards and data consumers don’t misinterpret NULLs as errors
Python Deep Dive: Filtering with List Comprehensions
Challenge Recap:
nums = [1, 2, 3, 4, 5]
evens = [x for x in nums if x % 2 == 0]
print(evens)
🔘 A) [1, 3, 5]
🔘 B) [2, 4]
🔘 C) [1, 2, 3, 4, 5]
🔘 D) Error
✅ Answer: Option B - [2, 4]
Why This Happens:
This list comprehension filters the list and includes only even numbers.x % 2 == 0
evaluates to True
only for 2
and 4
.
Where It’s Used in Real-World Applications:
Filtering numeric lists for condition matches
Cleaning JSON or CSV fields in ETL
Writing one-liners for quick data processing
Run & Test on OneCompiler:
1️⃣ Open OneCompiler → Select Python 3
2️⃣ Paste this code:
nums = [1, 2, 3, 4, 5]
evens = [x for x in nums if x % 2 == 0]
print(evens)
3️⃣ Click Run to confirm output is [2, 4]
Best Practices:
✔️ Use comprehensions for clean, readable filtering
✔️ Avoid embedding complex logic—use functions for readability
✔️ Prefer list comprehensions over filter()
unless lazy evaluation is needed
ETL Deep Dive: Airflow depends_on_past=True
Challenge Recap:
❓ What does setting depends_on_past=True
in a task do?
🔘 A) Ensures task waits for all parallel tasks
🔘 B) Ensures task runs after the next scheduled DAG
🔘 C) Forces re-run of all past failed DAGs
🔘 D) Prevents task from running if it failed in the previous DAG run
✅ Answer: Option D - Prevents task from running if it failed in the previous DAG run
Why This Happens:
In Airflow, depends_on_past=True
ensures that a task will only run if the same task succeeded in the previous run. This is critical for incremental ETL jobs that depend on successful state tracking.