Zero2Dataengineer

Zero2Dataengineer

Break It. Build It.

Day 19/30 - DEEP DIVE SOLUTIONS: SQL, PYTHON, ETL, DATA MODELING

Solutions for March 20th, 2025 CHALLENGE – Unlock the Full Breakdown + Live Runnable Code!

Avantika_Penumarty's avatar
Avantika_Penumarty
Mar 21, 2025
∙ Paid

👋 Hello Data Engineers!

Today, we’re diving deep into Self Joins in SQL, Dictionary Performance in Python, Handling Late Arriving Data in ETL, and Primary Key Selection in Data Modeling. If you've upgraded to Deep Dive, you're already ahead of 1000+ data engineers mastering these concepts with expert breakdowns and runnable code.

If you haven’t upgraded yet, you're missing out on real-world optimizations, performance tuning techniques, and job-ready problem-solving skills.

💡 Upgrade now and stay ahead of the competition!

📌 SQL Deep Dive: Self Joins & Hierarchical Data

Challenge Recap:

❓ Which SQL operation is best for finding employees who earn more than their manager?

🔘 A) INNER JOIN
🔘 B) SELF JOIN
🔘 C) FULL OUTER JOIN
🔘 D) UNION

✅ Answer: Option B - SELF JOIN

Why This Happens:

A SELF JOIN allows us to compare rows within the same table by treating it as two separate tables. To find employees earning more than their manager, we join the employees table to itself on the manager_id column.

Where It's Used in Real-World Applications:

  • HR Systems: Identifying employees whose salaries exceed their managers.

  • Company Hierarchy Analysis: Analyzing organizational structures in reporting tools.

  • Fraud Detection: Comparing transactions within the same dataset.

Run & Test on OneCompiler:

1️⃣ Open OneCompiler and select PostgreSQL.
2️⃣ Create a sample employees table:

CREATE TABLE employees (
    emp_id SERIAL PRIMARY KEY,
    emp_name VARCHAR(50),
    salary DECIMAL(10,2),
    manager_id INT
);

INSERT INTO employees (emp_name, salary, manager_id) VALUES
('Alice', 50000, NULL),
('Bob', 70000, 1),
('Charlie', 60000, 1),
('David', 90000, 2),
('Eve', 65000, 2);

3️⃣ Run a Self Join to find employees earning more than their managers:

SELECT e1.emp_name AS employee, e1.salary, e2.emp_name AS manager, e2.salary AS manager_salary
FROM employees e1
JOIN employees e2 ON e1.manager_id = e2.emp_id
WHERE e1.salary > e2.salary;

4️⃣ Click Run and analyze the output.

What’s Happening in the Output?

  • The query compares employees with their managers and filters results where the employee earns more than their manager.

Best Practices:

✔️ Always alias tables (e1, e2) to differentiate the same table in a Self Join.
✔️ Index manager_id for optimized performance.
✔️ Ensure NULL values are handled for employees without managers.


🐍 Python Deep Dive: Dictionary Performance Optimization

Challenge Recap:

❓ Which data structure provides the fastest lookup time for key-value pairs?

🔘 A) List
🔘 B) Tuple
🔘 C) Dictionary
🔘 D) Set

✅ Answer: Option C - Dictionary

Why This Happens:

Dictionaries in Python use a hash table under the hood, allowing for O(1) average time complexity for key lookups, making them significantly faster than lists or tuples.

Where It's Used in Real-World Applications:

  • Caching Systems: Fast retrieval of previously computed results.

  • API Response Parsing: Storing JSON responses for quick access.

  • Database Indexing: Implementing key-value storage for fast searches.

Run & Test on OneCompiler:

1️⃣ Open OneCompiler and select Python 3.
2️⃣ Run this performance comparison:

import time

# Create test data
list_data = list(range(10**6))
tuple_data = tuple(range(10**6))
dict_data = {i: f"value_{i}" for i in range(10**6)}

# Lookup times
start_time = time.time()
_ = 999999 in list_data
print("List lookup time:", time.time() - start_time)

start_time = time.time()
_ = 999999 in tuple_data
print("Tuple lookup time:", time.time() - start_time)

start_time = time.time()
_ = dict_data.get(999999)
print("Dictionary lookup time:", time.time() - start_time)

3️⃣ Click Run and analyze the output.

User's avatar

Continue reading this post for free, courtesy of Avantika_Penumarty.

Or purchase a paid subscription.
© 2026 Avantika · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture