📢 Day 2/30 - SQL, PYTHON, ETL, DATA MODELLING CHALENGE SOLUTIONS🔥

🔓 Solutions for Feb 25th, 2025 Challenge

Feb 25, 2025

👋 Hey Data Engineers!

Welcome to Day 2 of the 30-Day Data Engineering Challenge 🚀. Today, we’re diving into SQL Window Functions, Python Dictionary Mutability, ETL Deduplication, and Data Modeling.💡 What stood out to you today? Drop your thoughts in the comments!👇

✅ FREE Solutions + Reasoning → Now Available!
🔗 Subscribe & access here:

🚀 SQL Challenge - Window Functions & Ranking

Question Recap:

What will be the output of this SQL query?

sql

SELECT employee_id, department_id, 
       ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank 
FROM employees;

❓ Options:
🔘 A) Assigns a rank to each employee within a department
🔘 B) Counts total employees
🔘 C) Generates unique numbers across all employees
🔘 D) SQL Error

✅ Answer: A) Assigns a rank to each employee within a department

💡 Explanation:

ROW_NUMBER() is a window function that assigns a unique rank to each row within a partition.
PARTITION BY department_id ensures ranking restarts for each department.
ORDER BY salary DESC ranks employees within their department based on salary (highest first).

This function is particularly useful in scenarios like ranking products, employees, or leaderboard standings within a category.

🐍 Python Challenge - Dictionary Mutability

Question Recap:

What will be the output of the following Python snippet?

python

dict_1 = {'a': 1, 'b': 2}
dict_2 = dict_1
dict_2['c'] = 3
print(dict_1)

❓ Options:
🔘 A) {'a': 1, 'b': 2}
🔘 B) {'a': 1, 'b': 2, 'c': 3}
🔘 C) {'c': 3}
🔘 D) Error

✅ Answer: B) {'a': 1, 'b': 2, 'c': 3}

💡 Explanation:

Dictionaries are mutable in Python, meaning changes to one reference affect all references to the same object.
dict_2 = dict_1 does not create a new dictionary, but rather creates a new reference to the same object in memory.
Any modification to dict_2 modifies dict_1 as well.

To avoid this, use dict.copy() for a shallow copy or copy.deepcopy() for a deep copy when working with nested structures.

⚡ ETL Challenge - Data Deduplication

Question Recap:

What is the best way to remove duplicates from a dataset before loading it into a data warehouse?

❓ Options:
🔘 A) Use DISTINCT in SQL queries
🔘 B) Apply a GROUP BY operation with aggregation
🔘 C) Deduplicate at the source system before ingestion
🔘 D) All of the above

✅ Answer: D) All of the above

💡 Explanation:

Deduplication should happen at multiple stages of the data pipeline.
DISTINCT in SQL removes duplicates in queries but doesn’t prevent duplicate ingestion.
GROUP BY with aggregation helps in scenarios where duplicates must be merged based on business logic.
The best practice is to deduplicate at the source (if possible) to prevent storing unnecessary duplicates.

📌 Data Modeling - Many-to-Many Relationships

Question Recap:

How do you properly model a many-to-many relationship between students and courses?

❓ Options:
🔘 A) Add a course_id column in the students table
🔘 B) Add a student_id column in the courses table
🔘 C) Create a student_courses junction table
🔘 D) Use only foreign keys without a separate table

✅ Answer: C) Create a student_courses junction table

💡 Explanation:

Many-to-many relationships require an intermediate (junction) table to avoid redundancy and maintain normalized data.
The correct schema:

sql

CREATE TABLE student_courses (
    student_id INT REFERENCES students(student_id),
    course_id INT REFERENCES courses(course_id),
    PRIMARY KEY (student_id, course_id)
);

This approach ensures scalability and allows a single student to enroll in multiple courses while multiple students enroll in the same course.

🚀 Upgrade for Full Access – Learn in Detail, Never Forget!

If you’re enjoying these challenges, it’s time to go beyond just knowing the answers.

To truly master SQL, Python, ETL, and Data Modeling, you need deep conceptual clarity, hands-on practice, and real-world applications—not just solutions.

🚀 Want More? Unlock full breakdowns, runnable code, and real-world strategies!

Annual Membership

Now that you know the answers, let’s break them down with step-by-step deep dives, real-world applications, runnable code, and performance optimizations.

🚀 Follow these instructions to test & run live code:
1️⃣ Open onecompiler.com
2️⃣ Select the relevant language (PL/SQL/PostgreSQL or Python)
3️⃣ Copy and paste the code
4️⃣ Click Run and analyze the results

Discussion about this post

Ready for more?