<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Zero2Dataengineer: Break It. Build It.]]></title><description><![CDATA[Real DE and AI problems, real solutions. SQL, Python, ETL, Data Modeling and AI concepts broken down so you actually understand what went wrong and exactly how to fix it.]]></description><link>https://zero2dataengineer.substack.com/s/break-it-build-it</link><image><url>https://substackcdn.com/image/fetch/$s_!P4V8!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F480087e2-d585-43e3-8076-9e1282f0eb2d_200x200.png</url><title>Zero2Dataengineer: Break It. Build It.</title><link>https://zero2dataengineer.substack.com/s/break-it-build-it</link></image><generator>Substack</generator><lastBuildDate>Fri, 12 Jun 2026 09:25:02 GMT</lastBuildDate><atom:link href="https://zero2dataengineer.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Avantika]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[zero2dataengineer@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[zero2dataengineer@substack.com]]></itunes:email><itunes:name><![CDATA[Avantika_Penumarty]]></itunes:name></itunes:owner><itunes:author><![CDATA[Avantika_Penumarty]]></itunes:author><googleplay:owner><![CDATA[zero2dataengineer@substack.com]]></googleplay:owner><googleplay:email><![CDATA[zero2dataengineer@substack.com]]></googleplay:email><googleplay:author><![CDATA[Avantika_Penumarty]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Scale Does Not Break Your Code. It Breaks Your Assumptions.]]></title><description><![CDATA[I was wrong about retries. It cost a million users.]]></description><link>https://zero2dataengineer.substack.com/p/scale-does-not-break-your-code-it</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/scale-does-not-break-your-code-it</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Wed, 01 Apr 2026 00:31:02 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!dAIQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36d99c0-f515-46e6-951d-ba028293430a_1408x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h4 style="text-align: justify;">I remember the exact moment I realized I had no idea what I was doing.</h4><p style="text-align: justify;">It was my first week at Meta. I had just been handed access to one of the most complex data pipelines I had ever seen. A DAG running on 40 trillion events a day. Likes. Messages. Video views. Sensor pings from devices most people don&#8217;t even know exist.</p><p style="text-align: justify;">I sat there thinking, I have built pipelines before. Real ones. For Fortune 500 clients. I know Spark. I know SQL. I know how to ship. I&#8217;ve got this.</p><p style="text-align: justify;">I did not have this.</p><p style="text-align: justify;">The first thing that humbled me was not the complexity of the code. It was how wrong my assumptions were.</p><p style="text-align: justify;">Before Meta, I assumed deduplication was a solved problem. You write the logic once. It works. Done. At 10 million rows that is true. At 40 trillion events, I was generating duplicates that lived quietly in production for weeks before anyone noticed. And by the time we caught it, half the company was downstream of that bad data.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p style="text-align: justify;"><strong>The code was not wrong. My assumption was wrong.</strong></p><p style="text-align: justify;">The second assumption that broke me was around retries. In most systems, if a job fails you retry it. Simple. Safe. Standard practice. At Meta scale, a retry meant potentially processing the same event twice. Which meant potentially double charging a million users. Which meant a P0 incident at 2am with half the engineering org on a call.</p><p style="text-align: justify;">I had never once thought about idempotency as a design requirement. At scale it is not a nice to have. It is the difference between a working system and a crisis.</p><p style="text-align: justify;">The third assumption was around SLAs. I assumed if a job had a 4 hour SLA and ran in 2 hours in staging, we were fine. Until the cluster was hot. Until three other high priority jobs were competing for the same resources. Until my 2 hour job was at hour 6 and my SLA was breached and I was explaining to my manager why downstream dashboards were empty.</p><p style="text-align: justify;">Here is what I learned from all of this. The engineers who survive at that level are not the smartest ones in the room. They are the ones who documented every assumption their system made and then intentionally tried to break each one. Not in production. Not after an incident. Before it ever went live.</p><p style="text-align: justify;">That is the skill nobody teaches you. Not in bootcamps. Not in certifications. Not in any course I have ever taken. You learn it by being on the wrong side of an incident and spending 6 hours in a war room tracing back to the assumption you made three months ago that seemed totally reasonable at the time.</p><p style="text-align: justify;">Or you learn it here. Before it costs you a night of sleep.</p><p style="text-align: justify;">The framework below is the exact checklist I run before any pipeline goes to production. The same one I wish someone handed me in my first week at Meta.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe&quot;,&quot;text&quot;:&quot;Unlock the Full Framework&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/subscribe"><span>Unlock the Full Framework</span></a></p><h4><strong>THE ASSUMPTION AUDIT: HOW TO BREAK YOUR PIPELINE BEFORE IT BREAKS YOU</strong></h4><p>After years of building at scale, I now run every pipeline through five assumption categories before it goes live. Not because I am paranoid. Because every incident I have ever been part of traced back to exactly one of these five.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dAIQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36d99c0-f515-46e6-951d-ba028293430a_1408x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dAIQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36d99c0-f515-46e6-951d-ba028293430a_1408x768.png 424w, https://substackcdn.com/image/fetch/$s_!dAIQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36d99c0-f515-46e6-951d-ba028293430a_1408x768.png 848w, https://substackcdn.com/image/fetch/$s_!dAIQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36d99c0-f515-46e6-951d-ba028293430a_1408x768.png 1272w, https://substackcdn.com/image/fetch/$s_!dAIQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36d99c0-f515-46e6-951d-ba028293430a_1408x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dAIQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36d99c0-f515-46e6-951d-ba028293430a_1408x768.png" width="1408" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f36d99c0-f515-46e6-951d-ba028293430a_1408x768.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7744e48b-0890-4904-9952-fcd2e865afe2_1408x768.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1408,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1629403,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://zero2dataengineer.substack.com/i/192665197?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7744e48b-0890-4904-9952-fcd2e865afe2_1408x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dAIQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36d99c0-f515-46e6-951d-ba028293430a_1408x768.png 424w, https://substackcdn.com/image/fetch/$s_!dAIQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36d99c0-f515-46e6-951d-ba028293430a_1408x768.png 848w, https://substackcdn.com/image/fetch/$s_!dAIQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36d99c0-f515-46e6-951d-ba028293430a_1408x768.png 1272w, https://substackcdn.com/image/fetch/$s_!dAIQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36d99c0-f515-46e6-951d-ba028293430a_1408x768.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div>
      <p>
          <a href="https://zero2dataengineer.substack.com/p/scale-does-not-break-your-code-it">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Day 30/30 SQL, Python, ETL, Data Modelling Challenge FREE Solutions ]]></title><description><![CDATA[April 4th CHALLENGE &#8211; unlock solutions + reasoning]]></description><link>https://zero2dataengineer.substack.com/p/day-3030-sql-python-etl-data-modelling</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/day-3030-sql-python-etl-data-modelling</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Sat, 05 Apr 2025 01:11:40 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/4c726d4e-d10a-4b00-be3c-34c38be56819_1472x832.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3><strong>SQL Challenge &#8211; RANK vs DENSE_RANK</strong></h3><p><strong>Challenge Recap:</strong><br>&#10067; Which statement is TRUE about <code>RANK()</code> and <code>DENSE_RANK()</code>?</p><p>&#9989; <strong>Answer: B - </strong><code>RANK()</code><strong> skips ranks after ties, but </strong><code>DENSE_RANK()</code><strong> does not</strong></p><p><strong>Explanation:</strong></p><ul><li><p><code>RANK()</code> gives the same rank to tied rows and skips the next rank(s).</p></li><li><p><code>DENSE_RANK()</code> gives the same rank to tied rows but continues with the next consecutive rank.</p></li></ul><p><strong>Best Practices:</strong></p><ul><li><p>Use <code>RANK()</code> when gaps in rank are important (e.g., competitions)</p></li><li><p>Use <code>DENSE_RANK()</code> for continuous, gap-free ranks in reports</p></li><li><p>Combine with <code>PARTITION BY</code> for group-wise ranking</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3><strong>Python Challenge &#8211; List Comprehensions with Filtering</strong></h3><p><strong>Challenge Recap:</strong></p><pre><code>nums = [1, 2, 3, 4, 5] 
evens = [x for x in nums if x % 2 == 0] 
print(evens)</code></pre><p>&#9989; <strong>Answer: B - [2, 4]</strong></p><p><strong>Explanation:</strong><br>This list comprehension filters even numbers from the list using <code>x % 2 == 0</code>.</p><p><strong>Best Practices:</strong></p><ul><li><p>Use inline conditions for clean, readable filtering</p></li><li><p>Avoid complex expressions&#8212;offload to helper functions if needed</p></li><li><p>Prefer comprehensions over loops for simple transformations</p></li></ul><div><hr></div><h3><strong>ETL Challenge &#8211; Benefit of Incremental Loads</strong></h3><p><strong>Challenge Recap:</strong><br>&#10067; What is the primary benefit of incremental loading in ETL?</p><p>&#9989; <strong>Answer: C - It reduces data volume and speeds up processing</strong></p><p><strong>Explanation:</strong><br>Incremental loads process only new or updated records, improving efficiency, reducing costs, and minimizing processing time.</p><p><strong>Best Practices:</strong></p><ul><li><p>Track changes using <code>last_updated</code> or CDC fields</p></li><li><p>Store pipeline state (e.g., last run timestamp)</p></li><li><p>Use idempotent merge/upsert logic to avoid duplication</p></li></ul><div><hr></div><h3><strong>Data Modeling Challenge &#8211; OLAP Systems</strong></h3><p><strong>Challenge Recap:</strong><br>&#10067; Which of the following best describes OLAP systems?</p><p>&#9989; <strong>Answer: B - Focused on complex analytical queries and reporting</strong></p><p><strong>Explanation:</strong><br>OLAP systems are designed for read-heavy, analytical workloads using dimensional models. They power dashboards, insights, and aggregated reporting.</p><p><strong>Best Practices:</strong></p><ul><li><p>Use star/snowflake schema for dimensional modeling</p></li><li><p>Leverage OLAP for BI tools (Power BI, Tableau, Looker)</p></li><li><p>Apply indexing, partitioning, and surrogate keys for performance</p></li></ul><div><hr></div><p>&#127881; Congrats! You&#8217;ve completed all <strong>30 days</strong> of this challenge.</p><p><br>If you want to access all the <strong>Deep Dive versions</strong>, <strong>OneCompiler walkthroughs</strong>, <strong>mock interview packs</strong>, and <strong>exclusive case studies</strong> &#8212;</p><p>&#128073; Upgrade to Annual or Elite at <a href="https://zero2dataengineer.substack.com">zero2dataengineer.substack.com</a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share Zero2Dataengineer&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share Zero2Dataengineer</span></a></p>]]></content:encoded></item><item><title><![CDATA[Day 30/30 - DEEP DIVE SOLUTIONS: SQL, PYTHON, ETL, DATA MODELING]]></title><description><![CDATA[Solutions for April 4th, 2025 Challenge &#8211; Final Day Breakdown + Live Runnable Code]]></description><link>https://zero2dataengineer.substack.com/p/day-3030-deep-dive-solutions-sql</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/day-3030-deep-dive-solutions-sql</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Sat, 05 Apr 2025 00:58:05 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!RANP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff07e6779-3896-406e-8905-0105245dd147_3584x1004.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello Data Engineers,</p><p>You've made it to the FINAL DAY of our 30-Day Data Engineering Challenge!<br>We&#8217;re closing strong with advanced-level questions on:</p><ul><li><p>SQL Window Functions (RANK vs. DENSE_RANK)</p></li><li><p>Python Filtering with List Comprehensions</p></li><li><p>Incremental Loads in ETL</p></li><li><p>OLAP vs. OLTP System Design</p></li></ul><p>Let&#8217;s break them down in deep-dive detail.</p><div><hr></div><h3><strong>SQL Deep Dive: RANK() vs. DENSE_RANK()</strong></h3><p><strong>Challenge Recap:</strong><br>&#10067; Which statement is TRUE about <code>RANK()</code> and <code>DENSE_RANK()</code>?</p><p>&#128280; A) Both assign the same rank to tied rows and skip the next rank<br>&#128280; B) <code>RANK()</code> skips ranks after ties, but <code>DENSE_RANK()</code> does not<br>&#128280; C) <code>DENSE_RANK()</code> skips ranks after ties<br>&#128280; D) <code>RANK()</code> resets for each row</p><p>&#9989; <strong>Answer: B - </strong><code>RANK()</code><strong> skips ranks after ties, but </strong><code>DENSE_RANK()</code><strong> does not</strong></p><div><hr></div><h3><strong>Why This Happens:</strong></h3><ul><li><p><code>RANK()</code> assigns the same rank to tied rows, <strong>but skips</strong> the next rank(s).</p></li><li><p><code>DENSE_RANK()</code> assigns the same rank to tied rows but <strong>does not skip</strong>&#8212;it continues sequentially.</p></li></ul><div><hr></div><h3><strong>Where It&#8217;s Used in Real-World Applications:</strong></h3><ul><li><p><code>RANK()</code> is used when <strong>gaps in rank</strong> are important (e.g., competition-style scores).</p></li><li><p><code>DENSE_RANK()</code> is used when you want <strong>continuous ranking</strong> with no skipped numbers.</p></li></ul><div><hr></div><h3><strong>Run &amp; Test on OneCompiler:</strong></h3><p>1&#65039;&#8419; Open OneCompiler SQL &#8594; Choose <strong>PostgreSQL</strong><br>2&#65039;&#8419; Paste this:</p><pre><code>CREATE TABLE employees (
    name TEXT,
    department TEXT,
    salary INT
);

INSERT INTO employees VALUES
('Alice', 'Sales', 80000),
('Bob', 'Sales', 80000),
('Charlie', 'Sales', 75000),
('Dave', 'Sales', 70000);

SELECT name, salary,
    RANK() OVER (ORDER BY salary DESC) AS rank_val,
    DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank_val
FROM employees;</code></pre><p>3&#65039;&#8419; Click <strong>Run</strong> &#8594; Observe how RANK vs DENSE_RANK behaves with ties.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RANP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff07e6779-3896-406e-8905-0105245dd147_3584x1004.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RANP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff07e6779-3896-406e-8905-0105245dd147_3584x1004.png 424w, https://substackcdn.com/image/fetch/$s_!RANP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff07e6779-3896-406e-8905-0105245dd147_3584x1004.png 848w, https://substackcdn.com/image/fetch/$s_!RANP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff07e6779-3896-406e-8905-0105245dd147_3584x1004.png 1272w, https://substackcdn.com/image/fetch/$s_!RANP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff07e6779-3896-406e-8905-0105245dd147_3584x1004.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RANP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff07e6779-3896-406e-8905-0105245dd147_3584x1004.png" width="1456" height="408" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f07e6779-3896-406e-8905-0105245dd147_3584x1004.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:408,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:304788,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://zero2dataengineer.substack.com/i/160622251?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff07e6779-3896-406e-8905-0105245dd147_3584x1004.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RANP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff07e6779-3896-406e-8905-0105245dd147_3584x1004.png 424w, https://substackcdn.com/image/fetch/$s_!RANP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff07e6779-3896-406e-8905-0105245dd147_3584x1004.png 848w, https://substackcdn.com/image/fetch/$s_!RANP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff07e6779-3896-406e-8905-0105245dd147_3584x1004.png 1272w, https://substackcdn.com/image/fetch/$s_!RANP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff07e6779-3896-406e-8905-0105245dd147_3584x1004.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div>
      <p>
          <a href="https://zero2dataengineer.substack.com/p/day-3030-deep-dive-solutions-sql">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Day 29/30 SQL, Python, ETL, Data Modelling Challenge FREE Solutions ]]></title><description><![CDATA[April 3rd CHALLENGE &#8211; unlock solutions + reasoning]]></description><link>https://zero2dataengineer.substack.com/p/day-2930-sql-python-etl-data-modelling</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/day-2930-sql-python-etl-data-modelling</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Fri, 04 Apr 2025 00:31:09 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d366aa5-7187-4a5a-809a-e975617b4568_832x832.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>&#128075; Hey Data Engineers!</p><p>You&#8217;ve made it to <strong>Day 29</strong> &#8211; and today&#8217;s lineup is a strong one. We&#8217;re diving into:</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><ul><li><p>Recursive CTEs in SQL</p></li><li><p>Python&#8217;s zip() for iterable pairing</p></li><li><p>Task dependencies in Airflow</p></li><li><p>Type 2 Slowly Changing Dimensions in Data Warehousing</p></li></ul><div><hr></div><h3><strong>SQL Challenge &#8211; Recursive CTEs</strong></h3><p><strong>Challenge Recap:</strong><br>&#10067; What is a key use case for a recursive CTE in SQL?</p><p>&#9989; <strong>Answer: C - Generating a sequence or hierarchy</strong></p><p><strong>Explanation:</strong><br>Recursive CTEs allow SQL queries to loop through hierarchical data structures like employee-manager relationships or category trees. They include a base case and a recursive clause to build up results row-by-row.</p><p><strong>Best Practices:</strong></p><ul><li><p>Always define a clear base condition</p></li><li><p>Use a <code>LEVEL</code> or <code>DEPTH</code> column to track recursion</p></li><li><p>Limit recursion to avoid infinite loops</p></li></ul><div><hr></div><h3><strong>Python Challenge &#8211; zip() Function</strong></h3><p><strong>Challenge Recap:</strong></p><pre><code>a = [1, 2, 3]  
b = ['x', 'y', 'z']  
print(list(zip(a, b)))</code></pre><p>&#9989; <strong>Answer: A - [(1, 'x'), (2, 'y'), (3, 'z')]</strong></p><p><strong>Explanation:</strong><br>The <code>zip()</code> function pairs items from multiple iterables into tuples. It&#8217;s especially useful when merging values from two separate lists into one structured format.</p><p><strong>Best Practices:</strong></p><ul><li><p>Use <code>zip()</code> to merge lists of equal length</p></li><li><p>Convert zipped results to dicts with <code>dict(zip(keys, values))</code></p></li><li><p>Use <code>itertools.zip_longest()</code> if lists are uneven</p></li></ul><div><hr></div><h3><strong>ETL Challenge &#8211; Task Dependencies in Airflow</strong></h3><p><strong>Challenge Recap:</strong><br>&#10067; In Apache Airflow, which operator sets task execution order?</p><p>&#9989; <strong>Answer: C - &gt;&gt; or set_downstream()</strong></p><p><strong>Explanation:</strong><br>In Airflow, <code>&gt;&gt;</code> and <code>&lt;&lt;</code> are symbolic operators used to define task execution order. They form the dependency structure that determines how tasks run within a DAG.</p><p><strong>Best Practices:</strong></p><ul><li><p>Use <code>&gt;&gt;</code> and <code>&lt;&lt;</code> for clean and readable DAGs</p></li><li><p>Avoid circular dependencies</p></li><li><p>Use TaskGroups to organize complex DAGs</p></li></ul><div><hr></div><h3><strong>Data Modeling Challenge &#8211; SCD Type 2</strong></h3><p><strong>Challenge Recap:</strong><br>&#10067; Which SCD type maintains both current and historical data?</p><p>&#9989; <strong>Answer: C - Type 2</strong></p><p><strong>Explanation:</strong><br>SCD Type 2 keeps a full history of data changes by creating a new record for each update, typically using <code>effective_from</code> and <code>effective_to</code> date columns.</p><p><strong>Best Practices:</strong></p><ul><li><p>Use surrogate keys (not business keys) as primary keys</p></li><li><p>Ensure only one current record has <code>effective_to IS NULL</code></p></li><li><p>Use SCD2 when tracking slowly changing attributes like location, title, etc.</p></li></ul><div><hr></div><p>&#128204; Want access to full deep dives, live runnable code, interview patterns, and bonus mock prep?</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe&quot;,&quot;text&quot;:&quot;UPGRADE TO DEEP DIVE&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/subscribe"><span>UPGRADE TO DEEP DIVE</span></a></p><p>&#128172; Drop your score in the comments. Leaderboard ends tomorrow!</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Day 29/30 - DEEP DIVE SOLUTIONS: SQL, PYTHON, ETL, DATA MODELING]]></title><description><![CDATA[Solutions for April 3rd, 2025 Challenge &#8211; Full Breakdown + Live Runnable Code]]></description><link>https://zero2dataengineer.substack.com/p/day-2930-deep-dive-solutions-sql</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/day-2930-deep-dive-solutions-sql</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Fri, 04 Apr 2025 00:31:08 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!RtqG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb91e9644-a981-4420-9a16-b3937342a3b9_3584x1042.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello Data Engineers,</p><p>You&#8217;ve made it to Day 29&#8212;nearing the finish line of our 30-Day Data Engineering Challenge. Today&#8217;s focus is all about mastering <strong>recursive CTEs</strong>, <strong>Python&#8217;s zip()</strong>, <strong>Airflow task dependencies</strong>, and <strong>Slowly Changing Dimensions (SCD Type 2)</strong>.</p><p>Let&#8217;s dive into today&#8217;s applied concepts.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3><strong>SQL Deep Dive: Recursive CTEs for Hierarchies</strong></h3><p><strong>Challenge Recap:</strong><br>&#10067; What is a key use case for a recursive CTE in SQL?</p><p>&#128280; A) Sorting data alphabetically<br>&#128280; B) Creating backup tables<br>&#128280; C) Generating a sequence or hierarchy<br>&#128280; D) Dropping temporary tables</p><p>&#9989; <strong>Answer: Option C - Generating a sequence or hierarchy</strong></p><div><hr></div><h3><strong>Why This Happens:</strong></h3><p>Recursive CTEs are used when a query needs to <strong>loop through hierarchical relationships</strong>, like employees and managers or category/subcategory trees.</p><div><hr></div><h3><strong>Where It&#8217;s Used in Real-World Applications:</strong></h3><ul><li><p>Org chart traversal</p></li><li><p>Product hierarchy roll-ups</p></li><li><p>File system tree expansion</p></li></ul><div><hr></div><h3><strong>Run &amp; Test on OneCompiler:</strong></h3><p>1&#65039;&#8419; Open OneCompiler &#8594; Choose <strong>PostgreSQL</strong><br>2&#65039;&#8419; Paste this SQL code:</p>
      <p>
          <a href="https://zero2dataengineer.substack.com/p/day-2930-deep-dive-solutions-sql">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Day 28/30 SQL, Python, ETL, Data Modelling Challenge FREE Solutions]]></title><description><![CDATA[April 2nd CHALLENGE &#8211; unlock solutions + reasoning]]></description><link>https://zero2dataengineer.substack.com/p/day-2830-sql-python-etl-data-modelling</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/day-2830-sql-python-etl-data-modelling</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Thu, 03 Apr 2025 00:55:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d366aa5-7187-4a5a-809a-e975617b4568_832x832.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>&#128075; Hey Data Engineers!</p><p>Welcome to Day 28 of the 30-Day Data Engineering Challenge.<br>Today&#8217;s challenge hits four high-impact concepts you&#8217;ll see often in real-world pipelines and interviews:</p><ul><li><p>SQL Window Functions (RANK)</p></li><li><p>Python Dictionary Comprehensions</p></li><li><p>Incremental Load Logic in ETL</p></li><li><p>Surrogate Keys in Data Warehousing</p></li></ul><p>&#129504; <strong>Don&#8217;t just memorize&#8212;understand.</strong> Every challenge solution includes:<br>&#9989; <strong>Clear explanation &amp; reasoning</strong><br>&#9989; <strong>Why this solution works</strong><br>&#9989; <strong>Key optimizations &amp; best practices</strong></p><p>If you want <strong>deep dives + runnable code</strong> to test these solutions, <strong><a href="https://zero2dataengineer.substack.com/subscribe">upgrade to the annual plan</a> and master these concepts like a pro!</strong></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3><strong>SQL Challenge &#8211; Ranking Within Groups</strong></h3><p><strong>Challenge Recap:</strong><br>&#10067; What does this SQL query return?</p><pre><code>SELECT name, department, RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank FROM employees;</code></pre><p>&#9989; <strong>Answer: B - Assigns rank within each department by salary</strong></p><p><strong>Explanation:</strong><br>The <code>RANK()</code> window function assigns a rank to each employee <strong>within their department</strong>, ordered by salary in descending order. Employees with equal salaries get the same rank, and the next rank is skipped accordingly.</p><p><strong>Best Practices:</strong></p><ul><li><p>Use <code>RANK()</code> for handling ties in rankings</p></li><li><p>Combine <code>PARTITION BY</code> with <code>ORDER BY</code> for grouped logic</p></li><li><p>Prefer <code>DENSE_RANK()</code> if you don&#8217;t want gaps in rank numbers</p></li></ul><div><hr></div><h3><strong>Python Challenge &#8211; Dictionary Comprehensions</strong></h3><p><strong>Challenge Recap:</strong></p><pre><code>squares = {x: x**2 for x in range(3)} print(squares)</code></pre><p>&#9989; <strong>Answer: B - {0: 0, 1: 1, 2: 4}</strong></p><p><strong>Explanation:</strong><br>This is a dictionary comprehension that creates key-value pairs where the key is <code>x</code> and the value is <code>x**2</code>. The result is a compact dictionary of squares from 0 to 2.</p><p><strong>Best Practices:</strong></p><ul><li><p>Use dictionary comprehensions for quick mapping</p></li><li><p>Avoid nesting too many expressions&#8212;offload to functions</p></li><li><p>Great for lookups, configs, and lightweight transformations</p></li></ul><div><hr></div><h3><strong>ETL Challenge &#8211; Incremental Load Efficiency</strong></h3><p><strong>Challenge Recap:</strong><br>&#10067; What&#8217;s the primary benefit of incremental loads?</p><p>&#9989; <strong>Answer: B - Loads only changed/new data since last run</strong></p><p><strong>Explanation:</strong><br>Incremental loading processes only new or modified records. It reduces resource usage, speeds up pipelines, and prevents duplication. You typically rely on a <code>last_updated</code> field or change flag.</p><p><strong>Best Practices:</strong></p><ul><li><p>Track <code>last_updated</code> or <code>ingestion_time</code></p></li><li><p>Store pipeline state between runs</p></li><li><p>Avoid full loads unless schema has changed</p></li></ul><div><hr></div><h3><strong>Data Modeling Challenge &#8211; Surrogate Keys</strong></h3><p><strong>Challenge Recap:</strong><br>&#10067; Why are surrogate keys used in data warehouses?</p><p>&#9989; <strong>Answer: C - They reduce complexity and changes</strong></p><p><strong>Explanation:</strong><br>Surrogate keys are internal IDs that uniquely identify rows without relying on change-prone business values. They support versioning, history tracking, and cleaner joins in star schemas.</p><p><strong>Best Practices:</strong></p><ul><li><p>Use surrogate keys in all dimension tables</p></li><li><p>Keep natural keys for reference, not as primary keys</p></li><li><p>Use them to enable Slowly Changing Dimensions (SCD Type 2)</p></li></ul><div><hr></div><p>&#128204; Want all the <strong>deep dive breakdowns</strong>, <strong>live OneCompiler testing</strong>, and <strong>career prep bonuses</strong>?</p><p><strong><a href="https://zero2dataengineer.substack.com/subscribe">Upgrade to the Annual or Elite Plan</a></strong></p><p>&#128172; Share your answers in the comments.<br>&#127919; Get recognized in our leaderboard and win recruiter shoutouts!</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/p/day-2830-sql-python-etl-data-modelling?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Zero2Dataengineer! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/p/day-2830-sql-python-etl-data-modelling?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/p/day-2830-sql-python-etl-data-modelling?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p></p>]]></content:encoded></item><item><title><![CDATA[Day 28/30 - DEEP DIVE SOLUTIONS: SQL, PYTHON, ETL, DATA MODELING]]></title><description><![CDATA[Solutions for April 2nd, 2025 Challenge &#8211; Full Breakdown + Live Runnable Code]]></description><link>https://zero2dataengineer.substack.com/p/day-2830-deep-dive-solutions-sql</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/day-2830-deep-dive-solutions-sql</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Thu, 03 Apr 2025 00:39:30 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!4984!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434c9cc8-bb95-48fb-80bf-e7c162afc4a1_3580x918.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello Data Engineers,</p><p>Today is for the high-performers. You&#8217;ll deep dive into advanced <strong>window functions</strong>, <strong>Python dictionary comprehensions</strong>, <strong>incremental load strategies</strong>, and <strong>surrogate keys in data modeling</strong>. These are crucial for interviews, production code, and scalable data systems.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3><strong>SQL Deep Dive: Window Function with RANK()</strong></h3><p><strong>Challenge Recap:</strong><br>&#10067; What does this query return?</p><pre><code>SELECT name, department, RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;</code></pre><p>&#128280; A) Assigns unique rank globally across all employees<br>&#128280; B) Assigns rank within each department by salary<br>&#128280; C) Returns error due to missing GROUP BY<br>&#128280; D) Gives same rank to all employees</p><p>&#9989; <strong>Answer: Option B - Assigns rank within each department by salary</strong></p><div><hr></div><h3><strong>Why This Happens:</strong></h3><p><code>RANK()</code> is a window function that assigns a rank within each <strong>partition</strong> (<code>department</code> here). It ranks employees by salary (descending). Ties get the same rank, and it skips the next rank(s) accordingly.</p><div><hr></div><h3><strong>Where It&#8217;s Used in Real-World Applications:</strong></h3><ul><li><p>Departmental bonus allocations</p></li><li><p>Identifying top N performers per team</p></li><li><p>Generating leaderboards by category</p></li></ul><div><hr></div><h3><strong>Run &amp; Test on OneCompiler:</strong></h3><p>1&#65039;&#8419; Open OneCompiler &#8594; Choose <strong>PostgreSQL</strong><br>2&#65039;&#8419; Paste the following:</p><pre><code>CREATE TABLE employees (
    name TEXT,
    department TEXT,
    salary INT
);

INSERT INTO employees VALUES
('Alice', 'Sales', 70000),
('Bob', 'Sales', 70000),
('Charlie', 'Sales', 60000),
('Dave', 'Engineering', 90000),
('Eve', 'Engineering', 85000);

SELECT name, department,
       RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;</code></pre><p>3&#65039;&#8419; Click <strong>Run</strong> and observe how rank is reset within each department and ties share the same rank.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4984!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434c9cc8-bb95-48fb-80bf-e7c162afc4a1_3580x918.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4984!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434c9cc8-bb95-48fb-80bf-e7c162afc4a1_3580x918.png 424w, https://substackcdn.com/image/fetch/$s_!4984!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434c9cc8-bb95-48fb-80bf-e7c162afc4a1_3580x918.png 848w, https://substackcdn.com/image/fetch/$s_!4984!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434c9cc8-bb95-48fb-80bf-e7c162afc4a1_3580x918.png 1272w, https://substackcdn.com/image/fetch/$s_!4984!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434c9cc8-bb95-48fb-80bf-e7c162afc4a1_3580x918.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4984!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434c9cc8-bb95-48fb-80bf-e7c162afc4a1_3580x918.png" width="1456" height="373" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/434c9cc8-bb95-48fb-80bf-e7c162afc4a1_3580x918.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:373,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:312797,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://zero2dataengineer.substack.com/i/160463756?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434c9cc8-bb95-48fb-80bf-e7c162afc4a1_3580x918.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4984!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434c9cc8-bb95-48fb-80bf-e7c162afc4a1_3580x918.png 424w, https://substackcdn.com/image/fetch/$s_!4984!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434c9cc8-bb95-48fb-80bf-e7c162afc4a1_3580x918.png 848w, https://substackcdn.com/image/fetch/$s_!4984!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434c9cc8-bb95-48fb-80bf-e7c162afc4a1_3580x918.png 1272w, https://substackcdn.com/image/fetch/$s_!4984!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434c9cc8-bb95-48fb-80bf-e7c162afc4a1_3580x918.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h3><strong>Best Practices:</strong></h3><p>&#10004;&#65039; Use <code>RANK()</code> when ties should be reflected in ranking<br>&#10004;&#65039; Use <code>DENSE_RANK()</code> to avoid skipped numbers<br>&#10004;&#65039; Combine <code>PARTITION BY</code> + <code>ORDER BY</code> for grouped ranking</p><div><hr></div><h3><strong>Python Deep Dive: Dictionary Comprehensions</strong></h3><p><strong>Challenge Recap:</strong></p><pre><code>squares = {x: x**2 for x in range(3)}
print(squares)</code></pre><p>&#128280; A) {1: 1, 2: 4, 3: 9}<br>&#128280; B) {0: 0, 1: 1, 2: 4}<br>&#128280; C) [0, 1, 4]<br>&#128280; D) Syntax Error</p><p>&#9989; <strong>Answer: Option B - {0: 0, 1: 1, 2: 4}</strong></p><div><hr></div><h3><strong>Why This Happens:</strong></h3><p>This is a <strong>dictionary comprehension</strong>: it constructs a dictionary where each <code>x</code> in <code>range(3)</code> becomes a key, and <code>x**2</code> becomes the value. That results in: <code>{0: 0, 1: 1, 2: 4}</code>.</p>
      <p>
          <a href="https://zero2dataengineer.substack.com/p/day-2830-deep-dive-solutions-sql">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Day 27/30 SQL, Python, ETL, Data Modelling Challenge FREE Solutions ]]></title><description><![CDATA[April 1st CHALLENGE &#8211; unlock solutions + reasoning]]></description><link>https://zero2dataengineer.substack.com/p/day-2730-sql-python-etl-data-modelling</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/day-2730-sql-python-etl-data-modelling</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Wed, 02 Apr 2025 00:01:50 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d366aa5-7187-4a5a-809a-e975617b4568_832x832.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>&#128075; Hey Data Engineers!</p><p>Welcome to Day 27 of the 30-Day Data Engineering Challenge.<br>Today&#8217;s challenge will sharpen your core decision-making on:</p><ul><li><p>Handling NULLs with COALESCE in SQL</p></li><li><p>Python list comprehensions</p></li><li><p>Airflow task dependencies</p></li><li><p>Normalization principles (3NF)</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h3><strong>SQL Challenge &#8211; NULL Handling with COALESCE</strong></h3><p><strong>Challenge Recap:</strong><br>&#10067; What will this query return?</p><pre><code>SELECT COALESCE(SUM(salary), 0)
FROM employees
WHERE department = 'Marketing';</code></pre><p>&#9989; <strong>Answer: B - 0 if there are no marketing employees</strong></p><p><strong>Explanation:</strong><br>If no employees exist in the 'Marketing' department, <code>SUM(salary)</code> returns <code>NULL</code>. <code>COALESCE()</code> replaces that with 0, making the query return a non-null default.</p><p><strong>Best Practices:</strong></p><ul><li><p>Always wrap aggregations in <code>COALESCE()</code> when querying filtered data</p></li><li><p>Prevents unexpected NULLs in reports or dashboards</p></li><li><p>Use <code>COALESCE(col, default)</code> for safe transformations</p></li></ul><div><hr></div><h3><strong>Python Challenge &#8211; List Comprehensions with Conditionals</strong></h3><p><strong>Challenge Recap:</strong></p><pre><code>nums = [1, 2, 3, 4, 5]
evens = [x for x in nums if x % 2 == 0]
print(evens)</code></pre><p>&#9989; <strong>Answer: B - [2, 4]</strong></p><p><strong>Explanation:</strong><br>The list comprehension filters numbers divisible by 2 and stores them. <code>x % 2 == 0</code> checks for even numbers.</p><p><strong>Best Practices:</strong></p><ul><li><p>Use list comprehensions for compact filtering</p></li><li><p>Combine <code>if</code> with <code>for</code> for readable, clean logic</p></li><li><p>Avoid putting complex logic inside list comprehensions&#8212;use functions instead</p></li></ul><div><hr></div><h3><strong>ETL Challenge &#8211; Airflow Task Dependencies</strong></h3><p><strong>Challenge Recap:</strong><br>&#10067; What does <code>depends_on_past=True</code> do in Airflow?</p><p>&#9989; <strong>Answer: D - Ensures task only runs if it succeeded in the previous DAG run</strong></p><p><strong>Explanation:</strong><br>This setting links task runs across DAG executions. It ensures a task won&#8217;t run today if it failed yesterday. Critical for time-based pipelines and data integrity.</p><p><strong>Best Practices:</strong></p><ul><li><p>Use <code>depends_on_past</code> for incremental jobs</p></li><li><p>Set <code>wait_for_downstream=True</code> for downstream coordination</p></li><li><p>Use cautiously&#8212;misuse can block DAG progress</p></li></ul><div><hr></div><h3><strong>Data Modeling Challenge &#8211; Understanding 3NF</strong></h3><p><strong>Challenge Recap:</strong><br>&#10067; Which of the following is true about Third Normal Form (3NF)?</p><p>&#9989; <strong>Answer: B - Eliminates all transitive dependencies</strong></p><p><strong>Explanation:</strong><br>3NF ensures that <strong>non-key attributes depend only on the primary key</strong>. It eliminates dependencies that chain through intermediate columns, helping reduce redundancy and update anomalies.</p><p><strong>Best Practices:</strong></p><ul><li><p>Normalize transactional systems up to 3NF</p></li><li><p>Avoid over-normalizing for reporting systems (OLAP)</p></li><li><p>Keep dimension tables clean and historical (use surrogate keys)</p></li></ul><div><hr></div><p>&#128204; Want Deep Dive walkthroughs + OneCompiler test cases + mock interview prep?</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe&quot;,&quot;text&quot;:&quot;UPGRADE TO DEEP DIVE&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/subscribe"><span>UPGRADE TO DEEP DIVE</span></a></p><p><strong>Upgrade to Annual or Elite Plan</strong> now at &#128073; <a href="https://zero2dataengineer.substack.com">zero2dataengineer.substack.com</a></p><p>&#128172; Drop your answers in the comments.<br>&#127919; Monthly winners get featured + recruiter visibility.<br>&#128226; Tag us on LinkedIn @zero2dataengineer if you're loving the challenge!</p>]]></content:encoded></item><item><title><![CDATA[Day 27/30 - DEEP DIVE SOLUTIONS: SQL, PYTHON, ETL, DATA MODELING]]></title><description><![CDATA[Solutions for April 1st, 2025 Challenge &#8211; Full Breakdown + Live Runnable Code]]></description><link>https://zero2dataengineer.substack.com/p/day-2730-deep-dive-solutions-sql</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/day-2730-deep-dive-solutions-sql</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Wed, 02 Apr 2025 00:01:47 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!9cUG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8cbb1db-c76a-4a11-91fa-91ddbb9aa717_3584x806.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello Data Engineers,</p><p>You&#8217;re just a few steps away from completing the 30-Day Challenge. Today&#8217;s deep dive covers:</p><ul><li><p>SQL NULL handling with <code>COALESCE</code></p></li><li><p>Python list comprehensions</p></li><li><p>Airflow&#8217;s <code>depends_on_past</code> logic</p></li><li><p>Data normalization and 3NF</p></li></ul><p>Let&#8217;s dive into today&#8217;s practical breakdowns + testing steps.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h3><strong>SQL Deep Dive: COALESCE and NULL Aggregates</strong></h3><p><strong>Challenge Recap:</strong><br>&#10067; What will this query return?</p><pre><code>SELECT COALESCE(SUM(salary), 0)
FROM employees
WHERE department = 'Marketing';</code></pre><p>&#128280; A) NULL if there are no Marketing employees<br>&#128280; B) 0 if there are no Marketing employees<br>&#128280; C) Error<br>&#128280; D) The string "No Marketing Employees"</p><p>&#9989; <strong>Answer: Option B - 0 if there are no Marketing employees</strong></p><div><hr></div><h3><strong>Why This Happens:</strong></h3><p>When <code>SUM()</code> has no matching rows, it returns <code>NULL</code>.<br>Wrapping it with <code>COALESCE()</code> ensures a default value (in this case, <code>0</code>) is returned instead&#8212;ideal for reports or dashboards where NULLs are undesirable.</p><div><hr></div><h3><strong>Where It&#8217;s Used in Real-World Applications:</strong></h3><ul><li><p>Defaulting missing totals in finance dashboards</p></li><li><p>Handling sparse data with fallback values</p></li><li><p>Avoiding broken charts or empty aggregates</p></li></ul><div><hr></div><h3><strong>Run &amp; Test on OneCompiler:</strong></h3><p>1&#65039;&#8419; Open OneCompiler &#8594; Select <strong>PostgreSQL</strong><br>2&#65039;&#8419; Paste this code:</p><pre><code>CREATE TABLE employees (
    id SERIAL,
    name TEXT,
    salary INT,
    department TEXT
);

INSERT INTO employees (name, salary, department) VALUES
('Alice', 80000, 'Engineering'),
('Bob', 90000, 'Engineering'),
('Charlie', 85000, 'HR');

-- Query for non-existent department
SELECT COALESCE(SUM(salary), 0)
FROM employees
WHERE department = 'Marketing';</code></pre><p>3&#65039;&#8419; Click <strong>Run</strong> and confirm that result is <code>0</code>.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9cUG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8cbb1db-c76a-4a11-91fa-91ddbb9aa717_3584x806.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9cUG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8cbb1db-c76a-4a11-91fa-91ddbb9aa717_3584x806.png 424w, https://substackcdn.com/image/fetch/$s_!9cUG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8cbb1db-c76a-4a11-91fa-91ddbb9aa717_3584x806.png 848w, https://substackcdn.com/image/fetch/$s_!9cUG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8cbb1db-c76a-4a11-91fa-91ddbb9aa717_3584x806.png 1272w, https://substackcdn.com/image/fetch/$s_!9cUG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8cbb1db-c76a-4a11-91fa-91ddbb9aa717_3584x806.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9cUG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8cbb1db-c76a-4a11-91fa-91ddbb9aa717_3584x806.png" width="1456" height="327" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b8cbb1db-c76a-4a11-91fa-91ddbb9aa717_3584x806.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:327,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:249153,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://zero2dataengineer.substack.com/i/160361094?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8cbb1db-c76a-4a11-91fa-91ddbb9aa717_3584x806.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9cUG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8cbb1db-c76a-4a11-91fa-91ddbb9aa717_3584x806.png 424w, https://substackcdn.com/image/fetch/$s_!9cUG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8cbb1db-c76a-4a11-91fa-91ddbb9aa717_3584x806.png 848w, https://substackcdn.com/image/fetch/$s_!9cUG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8cbb1db-c76a-4a11-91fa-91ddbb9aa717_3584x806.png 1272w, https://substackcdn.com/image/fetch/$s_!9cUG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8cbb1db-c76a-4a11-91fa-91ddbb9aa717_3584x806.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div><hr></div><h3><strong>Best Practices:</strong></h3><p>&#10004;&#65039; Always wrap <code>SUM()</code>, <code>AVG()</code>, or <code>MAX()</code> with <code>COALESCE()</code> in filtered aggregates<br>&#10004;&#65039; Use <code>0</code> for numeric defaults, <code>''</code> for text defaults<br>&#10004;&#65039; Ensure dashboards and data consumers don&#8217;t misinterpret NULLs as errors</p><div><hr></div><h3><strong>Python Deep Dive: Filtering with List Comprehensions</strong></h3><p><strong>Challenge Recap:</strong></p><pre><code>nums = [1, 2, 3, 4, 5]
evens = [x for x in nums if x % 2 == 0]
print(evens)</code></pre><p>&#128280; A) [1, 3, 5]<br>&#128280; B) [2, 4]<br>&#128280; C) [1, 2, 3, 4, 5]<br>&#128280; D) Error</p><p>&#9989; <strong>Answer: Option B - [2, 4]</strong></p><div><hr></div><h3><strong>Why This Happens:</strong></h3><p>This list comprehension filters the list and includes only even numbers.<br><code>x % 2 == 0</code> evaluates to <code>True</code> only for <code>2</code> and <code>4</code>.</p><div><hr></div><h3><strong>Where It&#8217;s Used in Real-World Applications:</strong></h3><ul><li><p>Filtering numeric lists for condition matches</p></li><li><p>Cleaning JSON or CSV fields in ETL</p></li><li><p>Writing one-liners for quick data processing</p></li></ul><div><hr></div><h3><strong>Run &amp; Test on OneCompiler:</strong></h3><p>1&#65039;&#8419; Open OneCompiler &#8594; Select <strong>Python 3</strong><br>2&#65039;&#8419; Paste this code:</p><pre><code>nums = [1, 2, 3, 4, 5]
evens = [x for x in nums if x % 2 == 0]
print(evens)</code></pre><p>3&#65039;&#8419; Click <strong>Run</strong> to confirm output is <code>[2, 4]</code></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!I6Uz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82c7e200-0495-4210-857b-b6f343a6cb00_3580x478.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!I6Uz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82c7e200-0495-4210-857b-b6f343a6cb00_3580x478.png 424w, https://substackcdn.com/image/fetch/$s_!I6Uz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82c7e200-0495-4210-857b-b6f343a6cb00_3580x478.png 848w, https://substackcdn.com/image/fetch/$s_!I6Uz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82c7e200-0495-4210-857b-b6f343a6cb00_3580x478.png 1272w, https://substackcdn.com/image/fetch/$s_!I6Uz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82c7e200-0495-4210-857b-b6f343a6cb00_3580x478.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!I6Uz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82c7e200-0495-4210-857b-b6f343a6cb00_3580x478.png" width="1456" height="194" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/82c7e200-0495-4210-857b-b6f343a6cb00_3580x478.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:194,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:102404,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://zero2dataengineer.substack.com/i/160361094?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82c7e200-0495-4210-857b-b6f343a6cb00_3580x478.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!I6Uz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82c7e200-0495-4210-857b-b6f343a6cb00_3580x478.png 424w, https://substackcdn.com/image/fetch/$s_!I6Uz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82c7e200-0495-4210-857b-b6f343a6cb00_3580x478.png 848w, https://substackcdn.com/image/fetch/$s_!I6Uz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82c7e200-0495-4210-857b-b6f343a6cb00_3580x478.png 1272w, https://substackcdn.com/image/fetch/$s_!I6Uz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82c7e200-0495-4210-857b-b6f343a6cb00_3580x478.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div><hr></div><h3><strong>Best Practices:</strong></h3><p>&#10004;&#65039; Use comprehensions for clean, readable filtering<br>&#10004;&#65039; Avoid embedding complex logic&#8212;use functions for readability<br>&#10004;&#65039; Prefer list comprehensions over <code>filter()</code> unless lazy evaluation is needed</p><div><hr></div><h3><strong>ETL Deep Dive: Airflow </strong><code>depends_on_past=True</code></h3><p><strong>Challenge Recap:</strong><br>&#10067; What does setting <code>depends_on_past=True</code> in a task do?</p><p>&#128280; A) Ensures task waits for all parallel tasks<br>&#128280; B) Ensures task runs after the next scheduled DAG<br>&#128280; C) Forces re-run of all past failed DAGs<br>&#128280; D) Prevents task from running if it failed in the previous DAG run</p><p>&#9989; <strong>Answer: Option D - Prevents task from running if it failed in the previous DAG run</strong></p><div><hr></div><h3><strong>Why This Happens:</strong></h3><p>In Airflow, <code>depends_on_past=True</code> ensures that a task <strong>will only run if the same task succeeded in the previous run</strong>. This is critical for incremental ETL jobs that depend on successful state tracking.</p><div><hr></div><h3><strong>Where It&#8217;s Used in Real-World Applications:</strong></h3>
      <p>
          <a href="https://zero2dataengineer.substack.com/p/day-2730-deep-dive-solutions-sql">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Day 26/30 SQL, Python, ETL, Data Modelling Challenge FREE Solutions ]]></title><description><![CDATA[March 31st CHALLENGE &#8211; unlock solutions + reasoning]]></description><link>https://zero2dataengineer.substack.com/p/day-2630-sql-python-etl-data-modelling-c5b</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/day-2630-sql-python-etl-data-modelling-c5b</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Tue, 01 Apr 2025 00:31:01 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/54e0c862-8f6e-4b63-9d5f-2bb22a31c831_1472x832.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>&#128075; Hey Data Engineers!</p><p>Welcome to Day 26 of the 30-Day Data Engineering Challenge.<br>Today&#8217;s topics will level up your confidence with:</p><ul><li><p>SQL Recursive CTEs</p></li><li><p>Python Decorators</p></li><li><p>Incremental ETL Logic</p></li><li><p>Slowly Changing Dimensions in Data Modeling</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h3><strong>SQL Challenge &#8211; Recursive CTEs for Hierarchical Data</strong></h3><p><strong>Challenge Recap:</strong><br>&#10067; What does this SQL recursive query return?</p><pre><code>WITH RECURSIVE employee_path AS (
  SELECT id, name, manager_id FROM employees WHERE id = 1
  UNION ALL
  SELECT e.id, e.name, e.manager_id
  FROM employees e
  JOIN employee_path ep ON e.manager_id = ep.id
)
SELECT COUNT(*) FROM employee_path;</code></pre><p>&#9989; <strong>Answer: C - Number of employees under employee 1</strong></p><p><strong>Explanation:</strong><br>This recursive CTE starts from a single employee (ID = 1) and navigates all direct and indirect reports. The final count reflects how many employees fall under that node in the hierarchy.</p><p><strong>Best Practices:</strong></p><ul><li><p>Use recursive CTEs for org charts and category hierarchies</p></li><li><p>Always include a base case (starting point) and an exit condition</p></li><li><p>Add a <code>level</code> column to control recursion depth if needed</p></li></ul><div><hr></div><h3><strong>Python Challenge &#8211; Understanding Decorators</strong></h3><p><strong>Challenge Recap:</strong></p><pre><code>def decorator(func):
    def wrapper():
        print("Before")
        func()
        print("After")
    return wrapper

@decorator
def greet():
    print("Hello")

greet()</code></pre><p>&#9989; <strong>Answer: B - Before Hello After</strong></p><p><strong>Explanation:</strong><br>The <code>@decorator</code> syntax wraps <code>greet()</code> with additional logic&#8212;printing <code>"Before"</code> and <code>"After"</code> around the original function. It's a clean way to add behavior without modifying the function itself.</p><p><strong>Best Practices:</strong></p><ul><li><p>Use decorators for logging, timing, retries, and security</p></li><li><p>Wrap the original function using <code>functools.wraps()</code> to preserve metadata</p></li><li><p>Avoid overusing decorators when a regular function suffices</p></li></ul><div><hr></div><h3><strong>ETL Challenge &#8211; When to Use Incremental Loads</strong></h3><p><strong>Challenge Recap:</strong><br>&#10067; When is incremental loading preferred in an ETL pipeline?</p><p>&#9989; <strong>Answer: C - When only new/updated records need to be processed</strong></p><p><strong>Explanation:</strong><br>Incremental loads reduce resource usage by pulling only changes since the last job run. Ideal for growing datasets, especially when most records remain unchanged.</p><p><strong>Best Practices:</strong></p><ul><li><p>Use <code>last_updated</code> timestamps or change flags</p></li><li><p>Store job metadata to track state between runs</p></li><li><p>Design idempotent loads to avoid duplicates during retries</p></li></ul><div><hr></div><h3><strong>Data Modeling Challenge &#8211; SCD Type 2</strong></h3><p><strong>Challenge Recap:</strong><br>&#10067; Which SCD Type stores historical versions of changing records?</p><p>&#9989; <strong>Answer: C - Type 2</strong></p><p><strong>Explanation:</strong><br>SCD Type 2 tracks changes by inserting new rows with updated attributes and valid date ranges. Useful when you need a full timeline of how dimensions changed.</p><p><strong>Best Practices:</strong></p><ul><li><p>Use <code>effective_from</code> / <code>effective_to</code> columns</p></li><li><p>Keep current rows open (effective_to = NULL)</p></li><li><p>Add a surrogate key for uniqueness across versions</p></li></ul><div><hr></div><p>&#128204; Want full access to Deep Dive breakdowns, OneCompiler test cases, and real-world interview prep?</p><p><strong>Upgrade to Annual or Elite Plan</strong> at<br><strong><a href="https://zero2dataengineer.substack.com">zero2dataengineer.substack.com</a></strong> and unlock exclusive walkthroughs + runnable code + advanced best practices!</p><p>&#128172; Drop your answers in the comments.<br>&#127919; Top scorers get a LinkedIn shoutout and recruiter visibility next week!</p>]]></content:encoded></item><item><title><![CDATA[Day 26/30 - DEEP DIVE SOLUTIONS: SQL, PYTHON, ETL, DATA MODELING]]></title><description><![CDATA[Solutions for March 31st, 2025 Challenge &#8211; Full Breakdown + Live Runnable Code]]></description><link>https://zero2dataengineer.substack.com/p/day-2630-deep-dive-solutions-sql</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/day-2630-deep-dive-solutions-sql</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Tue, 01 Apr 2025 00:31:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!p4X5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16ba9d09-19fe-4bbf-9dc5-1949400ae280_3034x872.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello Data Engineers,</p><p>It&#8217;s Day 26 and we&#8217;re stepping into <strong>recursive queries</strong>, <strong>Python decorators</strong>, <strong>incremental pipelines</strong>, and <strong>SCD tracking</strong>. These concepts help you move from intermediate to advanced in both interview and real-world scenarios.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h3><strong>SQL Deep Dive: Recursive CTEs to Traverse Hierarchies</strong></h3><p><strong>Challenge Recap:</strong><br>&#10067; What does the following recursive query return?</p><pre><code>WITH RECURSIVE employee_path AS (
  SELECT id, name, manager_id FROM employees WHERE id = 1
  UNION ALL
  SELECT e.id, e.name, e.manager_id
  FROM employees e
  JOIN employee_path ep ON e.manager_id = ep.id
)
SELECT COUNT(*) FROM employee_path;</code></pre><p>&#128280; A) Total number of employees<br>&#128280; B) Depth of the org chart<br>&#128280; C) Number of employees under employee 1<br>&#128280; D) SQL Error</p><p>&#9989; <strong>Answer: Option C - Number of employees under employee 1</strong></p><div><hr></div><h3><strong>Why This Happens:</strong></h3><p>This recursive CTE starts from employee <code>id = 1</code> and recursively fetches all employees who report (directly or indirectly) to that employee. The final <code>COUNT(*)</code> tells how many such employees exist under that hierarchy.</p><div><hr></div><h3><strong>Where It&#8217;s Used in Real-World Applications:</strong></h3><ul><li><p>Organizational structure lookups</p></li><li><p>Customer referral trees</p></li><li><p>Category/subcategory relationships</p></li></ul><div><hr></div><h3><strong>Run &amp; Test on OneCompiler:</strong></h3><p>1&#65039;&#8419; Open OneCompiler &#8594; Select <strong>PostgreSQL</strong><br>2&#65039;&#8419; Paste the following code:</p><pre><code>CREATE TABLE employees (
    id INT,
    name TEXT,
    manager_id INT
);

INSERT INTO employees VALUES
(1, 'CEO', NULL),
(2, 'VP', 1),
(3, 'Director', 2),
(4, 'Engineer', 3);

WITH RECURSIVE employee_path AS (
  SELECT id, name, manager_id FROM employees WHERE id = 1
  UNION ALL
  SELECT e.id, e.name, e.manager_id
  FROM employees e
  JOIN employee_path ep ON e.manager_id = ep.id
)
SELECT COUNT(*) FROM employee_path WHERE id != 1;</code></pre><p>3&#65039;&#8419; Click <strong>Run</strong> to view the count of employees under ID 1.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!p4X5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16ba9d09-19fe-4bbf-9dc5-1949400ae280_3034x872.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!p4X5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16ba9d09-19fe-4bbf-9dc5-1949400ae280_3034x872.png 424w, https://substackcdn.com/image/fetch/$s_!p4X5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16ba9d09-19fe-4bbf-9dc5-1949400ae280_3034x872.png 848w, https://substackcdn.com/image/fetch/$s_!p4X5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16ba9d09-19fe-4bbf-9dc5-1949400ae280_3034x872.png 1272w, https://substackcdn.com/image/fetch/$s_!p4X5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16ba9d09-19fe-4bbf-9dc5-1949400ae280_3034x872.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!p4X5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16ba9d09-19fe-4bbf-9dc5-1949400ae280_3034x872.png" width="1456" height="418" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/16ba9d09-19fe-4bbf-9dc5-1949400ae280_3034x872.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:418,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:253508,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://zero2dataengineer.substack.com/i/160232528?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16ba9d09-19fe-4bbf-9dc5-1949400ae280_3034x872.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!p4X5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16ba9d09-19fe-4bbf-9dc5-1949400ae280_3034x872.png 424w, https://substackcdn.com/image/fetch/$s_!p4X5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16ba9d09-19fe-4bbf-9dc5-1949400ae280_3034x872.png 848w, https://substackcdn.com/image/fetch/$s_!p4X5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16ba9d09-19fe-4bbf-9dc5-1949400ae280_3034x872.png 1272w, https://substackcdn.com/image/fetch/$s_!p4X5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16ba9d09-19fe-4bbf-9dc5-1949400ae280_3034x872.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h3><strong>Best Practices:</strong></h3><p>&#10004;&#65039; Use recursive CTEs for trees and graphs<br>&#10004;&#65039; Always define a base case + recursive case<br>&#10004;&#65039; Add safeguards (e.g., depth or level column) to avoid infinite loops</p><div><hr></div><h3><strong>Python Deep Dive: Understanding Decorators</strong></h3><p><strong>Challenge Recap:</strong><br>&#10067; What will this code output?</p><pre><code>def decorator(func):
    def wrapper():
        print("Before")
        func()
        print("After")
    return wrapper

@decorator
def greet():
    print("Hello")

greet()</code></pre><p>&#128280; A) Hello<br>&#128280; B) Before Hello After<br>&#128280; C) Syntax Error<br>&#128280; D) After Hello Before</p><p>&#9989; <strong>Answer: Option B - Before Hello After</strong></p><div><hr></div><h3><strong>Why This Happens:</strong></h3><p>The <code>@decorator</code> syntax wraps <code>greet()</code> with the <code>wrapper()</code> function. When <code>greet()</code> is called, it actually runs the wrapper, printing <code>"Before"</code>, calling the original function, then <code>"After"</code>.</p>
      <p>
          <a href="https://zero2dataengineer.substack.com/p/day-2630-deep-dive-solutions-sql">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Day 25/30 - DEEP DIVE SOLUTIONS: SQL, PYTHON, ETL, DATA MODELING]]></title><description><![CDATA[Solutions for March 28th, 2025 Challenge &#8211; Full Breakdown + Live Runnable Code]]></description><link>https://zero2dataengineer.substack.com/p/day-2530-deep-dive-solutions-sql</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/day-2530-deep-dive-solutions-sql</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Sat, 29 Mar 2025 00:30:45 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-1Be!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F397d6716-b9a6-466c-9847-8f6882e6374b_3584x794.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello Data Engineers,</p><p>You&#8217;re wrapping up Week 5 with foundational concepts that come up constantly: <strong>group filtering with HAVING</strong>, <strong>Python string methods</strong>, <strong>data validation before loading</strong>, and the difference between OLTP and OLAP systems. Let's break each one down into usable, testable knowledge.</p><p>If you haven&#8217;t <strong>upgraded</strong> yet, this is where we go beyond just knowing the answers&#8212;giving you <strong>expert breakdowns, query tuning techniques, and best practices used in production systems</strong>.</p><p><strong>Upgrade now and stay ahead of the competition!</strong></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h3><strong>SQL Deep Dive: Group Filtering with HAVING</strong></h3><p><strong>Challenge Recap:</strong><br>&#10067; Which clause is used to filter after an aggregation has been applied?</p><p>&#128280; A) WHERE<br>&#128280; B) FILTER<br>&#128280; C) HAVING<br>&#128280; D) JOIN</p><p>&#9989; <strong>Answer: Option C - HAVING</strong></p><div><hr></div><h3><strong>Why This Happens:</strong></h3><p><code>WHERE</code> filters individual rows <strong>before aggregation</strong>, while <code>HAVING</code> filters <strong>after grouping</strong> has occurred. It&#8217;s used to return only the groups that match your condition.</p><div><hr></div><h3><strong>Where It&#8217;s Used in Real-World Applications:</strong></h3><ul><li><p>Returning only products with high sales volume</p></li><li><p>Filtering customer groups with high average spend</p></li><li><p>Identifying underperforming teams or regions</p><p></p></li></ul>
      <p>
          <a href="https://zero2dataengineer.substack.com/p/day-2530-deep-dive-solutions-sql">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Day 25/30 SQL, Python, ETL, Data Modelling Challenge FREE Solutions 🚀]]></title><description><![CDATA[March 27th, 2025 CHALLENGE &#8211; unlock solutions + reasoning]]></description><link>https://zero2dataengineer.substack.com/p/day-2530-sql-python-etl-data-modelling</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/day-2530-sql-python-etl-data-modelling</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Fri, 28 Mar 2025 00:35:58 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!P4V8!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F480087e2-d585-43e3-8076-9e1282f0eb2d_200x200.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>&#128075; <strong>Hey Data Engineers!</strong></p><p><strong>Difficulty Level: Intermediate &#8594; Advanced</strong></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>You&#8217;re officially at the 25-day mark! &#128293; Let&#8217;s sharpen your understanding of <code>HAVING</code> clauses, string manipulation, data validation, and OLTP system design.</p><p>&#128161; <strong>Understand, Don&#8217;t Memorize:</strong><br>&#9989; Real-world clarity<br>&#9989; Best practices<br>&#9989; Interview relevance</p><p>&#128161; Want full solutions + runnable code? Upgrade to the <strong>Annual Plan</strong> now and own your DE journey.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe&quot;,&quot;text&quot;:&quot;UPGRADE TO ANNUAL&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/subscribe"><span>UPGRADE TO ANNUAL</span></a></p><div><hr></div><h3>&#128204; SQL Challenge &#8211; Filtering with HAVING</h3><p>&#10067; Which clause is used to filter after an aggregation has been applied?</p><p>&#128280; A) WHERE<br>&#128280; B) FILTER<br>&#128280; C) HAVING<br>&#128280; D) JOIN</p><p>&#9989; <strong>Answer: C - HAVING</strong></p><p><strong>Explanation:</strong><br><code>HAVING</code> filters aggregated results (e.g., <code>GROUP BY dept HAVING COUNT(*) &gt; 10</code>), while <code>WHERE</code> filters rows <em>before</em> aggregation.</p><p><strong>Best Practices:</strong><br>&#10004;&#65039; Use <code>WHERE</code> before aggregation, <code>HAVING</code> after<br>&#10004;&#65039; Avoid <code>HAVING</code> without aggregation &#8211; it's a misuse<br>&#10004;&#65039; Use descriptive aliases for clarity</p><div><hr></div><h3>&#128013; Python Challenge &#8211; Title Case Conversion</h3><pre><code>sentence = "data engineering is fun"  
print(sentence.title())</code></pre><p>&#10067; What will be printed?</p><p>&#128280; A) Data Engineering Is Fun<br>&#128280; B) data Engineering Is Fun<br>&#128280; C) Data engineering is fun<br>&#128280; D) DATA ENGINEERING IS FUN</p><p>&#9989; <strong>Answer: A - Data Engineering Is Fun</strong></p><p><strong>Explanation:</strong><br><code>.title()</code> capitalizes the first letter of every word. Great for formatting labels, names, and headings.</p><p><strong>Best Practices:</strong><br>&#10004;&#65039; Use <code>.title()</code> for display, not for storage<br>&#10004;&#65039; Use <code>.capitalize()</code> if you want just the first word formatted<br>&#10004;&#65039; Be aware it doesn&#8217;t handle acronyms properly (e.g., &#8220;API&#8221; becomes &#8220;Api&#8221;)</p><div><hr></div><h3>&#9889; ETL Challenge &#8211; Null Value Checks</h3><p>&#10067; Which check would help catch null values before loading data into the warehouse?</p><p>&#128280; A) Type casting<br>&#128280; B) Primary key constraint<br>&#128280; C) NULL check in staging<br>&#128280; D) Denormalization</p><p>&#9989; <strong>Answer: C - NULL check in staging</strong></p><p><strong>Explanation:</strong><br>Pre-load checks in staging tables help catch nulls early and prevent bad data from polluting downstream systems.</p><p><strong>Best Practices:</strong><br>&#10004;&#65039; Use assertions or conditional filters on NULLs<br>&#10004;&#65039; Enforce NOT NULL where required<br>&#10004;&#65039; Log and quarantine invalid records</p><div><hr></div><h3>&#129521; Data Modeling Challenge &#8211; OLTP System Focus</h3><p>&#10067; OLTP systems are optimized for which type of operation?</p><p>&#128280; A) Large analytical queries<br>&#128280; B) Real-time reporting<br>&#128280; C) Frequent inserts and updates<br>&#128280; D) Batch processing</p><p>&#9989; <strong>Answer: C - Frequent inserts and updates</strong></p><p><strong>Explanation:</strong><br>OLTP = Online Transaction Processing &#8594; fast, atomic operations (e.g., banking, e-commerce orders).</p><p><strong>Best Practices:</strong><br>&#10004;&#65039; Use normalized schemas (3NF) for OLTP<br>&#10004;&#65039; Prioritize low latency over analytics<br>&#10004;&#65039; Offload reporting to OLAP systems</p><div><hr></div><h3>&#128640; Final 5 Days Incoming&#8230;</h3><p>You're almost at the finish line. Here's what you unlock with the full subscription:</p><p>&#9989; Advanced deep dives &amp; mock interview sets<br>&#9989; Runnable SQL/Python code for every challenge<br>&#9989; DE system design playbooks</p><p>&#128073; <strong>Subscribe here:</strong> <a href="https://zero2dataengineer.substack.com">zero2dataengineer.substack.com</a></p><p>&#128172; Comment your answers. Tomorrow&#8217;s leaderboard awaits! &#128293;</p>]]></content:encoded></item><item><title><![CDATA[Day 24/30 SQL, Python, ETL, Data Modelling Challenge FREE Solutions 🚀]]></title><description><![CDATA[March 27th, 2025 CHALLENGE &#8211; unlock solutions + reasoning]]></description><link>https://zero2dataengineer.substack.com/p/day-2430-sql-python-etl-data-modelling</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/day-2430-sql-python-etl-data-modelling</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Fri, 28 Mar 2025 00:35:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!P4V8!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F480087e2-d585-43e3-8076-9e1282f0eb2d_200x200.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>&#128075; <strong>Hey Data Engineers!</strong></p><p><strong>Difficulty Level: Intermediate &#8594; Advanced</strong></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Today we&#8217;re diving into indexing, conditional list comprehension, Airflow orchestration, and tracking historical changes in dimensional models. These topics are <strong>must-know</strong> for production-grade systems.</p><p><strong>Understand, Don&#8217;t Memorize:</strong><br>&#9989; Clear, practical explanations<br>&#9989; Production-ready tips<br>&#9989; Interview-first mindset</p><p>Want runnable code + exclusive deep dives? Upgrade to the <strong>Annual Plan</strong> and fast-track your DE skills.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe&quot;,&quot;text&quot;:&quot;UPGRADE TO ANNUAL MEMBERSHIP&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/subscribe"><span>UPGRADE TO ANNUAL MEMBERSHIP</span></a></p><div><hr></div><h3>&#128204; SQL Challenge &#8211; Indexing for Query Optimization</h3><p>&#10067; Which query would benefit MOST from adding an index on the <code>email</code> column?</p><p>&#128280; A) <code>SELECT * FROM users WHERE email = 'x@example.com'</code><br>&#128280; B) <code>SELECT COUNT(*) FROM users</code><br>&#128280; C) <code>SELECT * FROM users ORDER BY id DESC</code><br>&#128280; D) <code>SELECT DISTINCT department FROM users</code></p><p>&#9989; <strong>Answer: A - WHERE email = 'x@example.com'</strong></p><p><strong>Explanation:</strong><br>Indexing helps when filtering or joining on specific column values. An index on <code>email</code> speeds up lookups drastically.</p><p><strong>Best Practices:</strong><br>&#10004;&#65039; Add indexes to high-cardinality, frequently filtered columns<br>&#10004;&#65039; Avoid over-indexing (it slows writes)<br>&#10004;&#65039; Use <code>EXPLAIN</code> to verify performance gains</p><div><hr></div><h3>&#128013; Python Challenge &#8211; List Comprehension with Conditionals</h3><pre><code>nums = [1, 2, 3, 4, 5, 6]  
result = [x * 2 for x in nums if x % 2 == 0]  
print(result)</code></pre><p>&#10067; What will be the output?</p><p>&#128280; A) [2, 4, 6, 8, 10, 12]<br>&#128280; B) [4, 8, 12]<br>&#128280; C) [2, 4, 6]<br>&#128280; D) [4, 8]</p><p>&#9989; <strong>Answer: B - [4, 8, 12]</strong></p><p><strong>Explanation:</strong><br>It filters even numbers (<code>x % 2 == 0</code>) &#8594; [2, 4, 6], then doubles them: [4, 8, 12].</p><p><strong>Best Practices:</strong><br>&#10004;&#65039; Use list comprehensions for clean, readable filtering + transformation<br>&#10004;&#65039; Avoid unnecessary temp lists<br>&#10004;&#65039; Replace for-loops where possible</p><div><hr></div><h3>&#9889; ETL Challenge &#8211; Orchestrating Pipelines</h3><p>&#10067; Which of these tools is most commonly used for task orchestration in ETL?</p><p>&#128280; A) dbt<br>&#128280; B) Apache Airflow<br>&#128280; C) Kafka<br>&#128280; D) PostgreSQL</p><p>&#9989; <strong>Answer: B - Apache Airflow</strong></p><p><strong>Explanation:</strong><br>Airflow is the industry standard for defining and managing ETL workflows via DAGs (Directed Acyclic Graphs).</p><p><strong>Best Practices:</strong><br>&#10004;&#65039; Use task dependencies to control execution order<br>&#10004;&#65039; Monitor and retry failed tasks via Airflow UI<br>&#10004;&#65039; Combine with sensors for event-driven pipelines</p><div><hr></div><h3>&#129521; Data Modeling Challenge &#8211; Slowly Changing Dimensions</h3><p>&#10067; Which SCD type preserves <strong>full history</strong> of changes to a dimension?</p><p>&#128280; A) Type 0<br>&#128280; B) Type 1<br>&#128280; C) Type 2<br>&#128280; D) Type 3</p><p>&#9989; <strong>Answer: C - Type 2</strong></p><p><strong>Explanation:</strong><br>SCD Type 2 stores a new row for each historical change, ensuring you can always see what the dimension looked like at any point in time.</p><p><strong>Best Practices:</strong><br>&#10004;&#65039; Add <code>effective_date</code>, <code>end_date</code>, and <code>is_current</code> flags<br>&#10004;&#65039; Use surrogate keys to join facts to correct version<br>&#10004;&#65039; Automate inserts via merge logic</p><div><hr></div><h3>&#128640; Final Stretch &#8212; Let&#8217;s Finish Strong!</h3><p>You&#8217;re just a few days from completing this 30-day transformation.</p><p>&#9989; Join for hands-on SQL/Python projects<br>&#9989; Get exclusive mock interview questions<br>&#9989; Deep-dive with working code examples</p><p>&#128073; <strong>Subscribe here:</strong> <a href="https://zero2dataengineer.substack.com">zero2dataengineer.substack.com</a></p><p>&#128172; Drop your answers in the comments &#8212; best responses = shoutouts tomorrow </p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Day 24/30 - DEEP DIVE SOLUTIONS: SQL, PYTHON, ETL, DATA MODELING]]></title><description><![CDATA[Solutions for March 27th, 2025 Challenge &#8211; Full Breakdown + Live Runnable Code]]></description><link>https://zero2dataengineer.substack.com/p/day-2430-deep-dive-solutions-sql</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/day-2430-deep-dive-solutions-sql</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Fri, 28 Mar 2025 00:30:01 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!uXBU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07792966-d7dc-43b9-93be-8deb3d5ec191_3578x1006.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello Data Engineers,</p><p>Today we&#8217;re diving into <strong>index optimization</strong>, <strong>list comprehensions with filters</strong>, <strong>ETL orchestration tools</strong>, and <strong>Slowly Changing Dimensions (SCDs)</strong>. Whether you&#8217;re building scalable data platforms or preparing for a technical interview, this breakdown is a must-study.</p><p>If you haven&#8217;t <strong>upgraded</strong> yet, this is where we go beyond just knowing the answers&#8212;giving you <strong>expert breakdowns, query tuning techniques, and best practices used in production systems</strong>.</p><p><strong>Upgrade now and stay ahead of the competition!</strong></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3><strong>SQL Deep Dive: Indexes and Query Optimization</strong></h3><p><strong>Challenge Recap:</strong><br>&#10067; Which query would benefit MOST from adding an index on the <code>email</code> column?</p><p>&#128280; A) <code>SELECT * FROM users WHERE email = 'x@example.com'</code><br>&#128280; B) <code>SELECT COUNT(*) FROM users</code><br>&#128280; C) <code>SELECT * FROM users ORDER BY id DESC</code><br>&#128280; D) <code>SELECT DISTINCT department FROM users</code></p><p>&#9989; <strong>Answer: Option A - </strong><code>SELECT * FROM users WHERE email = 'x@example.com'</code></p><div><hr></div><h3><strong>Why This Happens:</strong></h3><p>Indexes improve performance <strong>only when a query filters rows using a specific column</strong>. Filtering by <code>email</code> is a perfect candidate for an index. The other options either scan the entire table or don&#8217;t benefit from an index on <code>email</code>.</p><div><hr></div><h3><strong>Where It&#8217;s Used in Real-World Applications:</strong></h3><ul><li><p>User lookup by email or username</p></li><li><p>Customer support systems querying accounts by email</p></li><li><p>Authentication systems</p></li></ul><div><hr></div>
      <p>
          <a href="https://zero2dataengineer.substack.com/p/day-2430-deep-dive-solutions-sql">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Day 23/30 SQL, Python, ETL, Data Modelling Challenge FREE Solutions 🚀]]></title><description><![CDATA[March 26th, 2025 CHALLENGE &#8211; unlock solutions + reasoning]]></description><link>https://zero2dataengineer.substack.com/p/day-2330-sql-python-etl-data-modelling</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/day-2330-sql-python-etl-data-modelling</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Thu, 27 Mar 2025 00:35:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!P4V8!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F480087e2-d585-43e3-8076-9e1282f0eb2d_200x200.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>&#128075; <strong>Hey Data Engineers!</strong></p><p><strong>Difficulty Level: Intermediate &#8594; Advanced</strong></p><p>Let&#8217;s build on recursive queries, Python lambda tricks, and core ETL + modeling strategies. These concepts show up <em>all the time</em> in DE interviews.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p><strong>Understand, Don&#8217;t Memorize:</strong><br>&#9989; Clear explanations<br>&#9989; Real-world context<br>&#9989; Interview-ready mindset</p><p>Want runnable code + expert breakdowns? Upgrade to the Annual Plan &amp; study like a pro.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe&quot;,&quot;text&quot;:&quot;UPGRADE TO ANNUAL MEMBERSHIP&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/subscribe"><span>UPGRADE TO ANNUAL MEMBERSHIP</span></a></p><h3>&#128204; SQL Challenge &#8211; Recursive CTEs</h3><p>&#10067; What will this recursive SQL CTE most likely be used for?</p><p>&#128280; A) Generating running totals<br>&#128280; B) Flattening nested JSON<br>&#128280; C) Traversing hierarchical parent-child relationships<br>&#128280; D) Optimizing join performance</p><p>&#9989; <strong>Answer: C - Traversing hierarchical parent-child relationships</strong></p><p><strong>Explanation:</strong><br>Recursive CTEs are perfect for org charts, folder structures, category trees, etc. They allow you to reference the same CTE within itself until a termination condition is met.</p><p><strong>Best Practices:</strong><br>&#10004;&#65039; Use <code>UNION ALL</code> inside recursive CTEs<br>&#10004;&#65039; Always include a base case and recursion limit<br>&#10004;&#65039; Optimize with indexing on ID and parent_ID</p><div><hr></div><h3>&#128013; Python Challenge &#8211; Lambda with <code>map()</code></h3><pre><code>nums = [1, 2, 3, 4]  
squared = list(map(lambda x: x**2, nums))  
print(squared)</code></pre><p>&#10067; What will be the output?</p><p>&#128280; A) [2, 4, 6, 8]<br>&#128280; B) [1, 4, 9, 16]<br>&#128280; C) [1, 2, 3, 4]<br>&#128280; D) Error</p><p>&#9989; <strong>Answer: B - [1, 4, 9, 16]</strong></p><p><strong>Explanation:</strong><br><code>map()</code> applies the lambda to every item in <code>nums</code>, and <code>x**2</code> squares each value. No mutation, just transformation.</p><p><strong>Best Practices:</strong><br>&#10004;&#65039; Use <code>map()</code> for clean, functional transformations<br>&#10004;&#65039; Combine with lambda for quick inline functions<br>&#10004;&#65039; Wrap in <code>list()</code> to evaluate in Python 3</p><div><hr></div><h3>&#9889; ETL Challenge &#8211; Incremental Load Optimization</h3><p>&#10067; What&#8217;s the primary benefit of using an incremental load strategy in ETL?</p><p>&#128280; A) Automatically deletes outdated records<br>&#128280; B) Loads only changed/new records, improving performance<br>&#128280; C) Replaces the entire dataset each time<br>&#128280; D) Eliminates the need for source system backups</p><p>&#9989; <strong>Answer: B - Loads only changed/new records</strong></p><p><strong>Explanation:</strong><br>Incremental loads reduce compute, I/O, and overall load time by processing <em>just the delta</em>. Especially useful when less than 10% of data changes daily.</p><p><strong>Best Practices:</strong><br>&#10004;&#65039; Use <code>last_modified</code> or watermark columns<br>&#10004;&#65039; Combine with CDC for real-time ingestion<br>&#10004;&#65039; Audit changes with hashing for data integrity</p><div><hr></div><h3>&#129521; Data Modeling Challenge &#8211; Denormalization Tradeoffs</h3><p>&#10067; Which of the following is a downside of denormalized tables?</p><p>&#128280; A) Faster query speeds<br>&#128280; B) Reduced table joins<br>&#128280; C) Data redundancy and update anomalies<br>&#128280; D) Smaller storage requirements</p><p>&#9989; <strong>Answer: C - Data redundancy and update anomalies</strong></p><p><strong>Explanation:</strong><br>Denormalized models duplicate data across tables for performance. But that comes at the cost of update complexity and potential inconsistency.</p><p><strong>Best Practices:</strong><br>&#10004;&#65039; Use denormalization for read-heavy systems (OLAP)<br>&#10004;&#65039; Avoid in OLTP systems needing transactional consistency<br>&#10004;&#65039; Always weigh performance vs. maintainability</p><div><hr></div><h3>&#128640; Want to Unlock Deep Dives?</h3><p>You&#8217;ve made it 75% through! Upgrade for:</p><p>&#9989; Runnable SQL/Python solutions<br>&#9989; Mock interview scenarios<br>&#9989; Advanced system design deep dives</p><p>&#128073; <strong>Subscribe here:</strong> <a href="https://zero2dataengineer.substack.com">zero2dataengineer.substack.com</a></p><p>&#128172; Comment your answers. Top contributors get a shoutout tomorrow &#128293;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/p/day-2330-sql-python-etl-data-modelling/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/p/day-2330-sql-python-etl-data-modelling/comments"><span>Leave a comment</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Day 23/30 - DEEP DIVE SOLUTIONS: SQL, PYTHON, ETL, DATA MODELING]]></title><description><![CDATA[Solutions for March 26th, 2025 Challenge &#8211; Full Breakdown + Live Runnable Code]]></description><link>https://zero2dataengineer.substack.com/p/day-2330-deep-dive-solutions-sql</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/day-2330-deep-dive-solutions-sql</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Thu, 27 Mar 2025 00:30:14 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!BAKg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d38aa39-d5ca-41b1-8e6b-11079156550d_3584x1088.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello Data Engineers,</p><p>You&#8217;ve crossed into the final stretch. Today&#8217;s challenges dive into <strong>recursive SQL CTEs</strong>, <strong>functional programming in Python</strong>, <strong>incremental load efficiency</strong>, and <strong>denormalization trade-offs</strong> in modeling. These are essential tools in both system design and interviews. Let&#8217;s break them down.</p><p>If you haven&#8217;t <strong>upgraded</strong> yet, this is where we go beyond just knowing the answers&#8212;giving you <strong>expert breakdowns, query tuning techniques, and best practices used in production systems</strong>.</p><p><strong>Upgrade now and stay ahead of the competition!</strong></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h3><strong>SQL Deep Dive: Recursive CTEs for Hierarchies</strong></h3><p><strong>Challenge Recap:</strong><br>&#10067; What will this recursive SQL CTE most likely be used for?</p><p>&#128280; A) Generating running totals<br>&#128280; B) Flattening nested JSON<br>&#128280; C) Traversing hierarchical parent-child relationships<br>&#128280; D) Optimizing join performance</p><p>&#9989; <strong>Answer: Option C - Traversing hierarchical parent-child relationships</strong></p><div><hr></div><h3><strong>Why This Happens:</strong></h3><p>Recursive CTEs repeatedly reference themselves to <strong>navigate hierarchical structures</strong> like org charts, category trees, and directory paths. They continue looping until a base condition fails.</p><div><hr></div><h3><strong>Where It&#8217;s Used in Real-World Applications:</strong></h3><ul><li><p>Org chart traversal</p></li><li><p>Folder and file system modeling</p></li><li><p>Multi-level product categories</p></li><li><p>Hierarchical menu rendering</p></li></ul>
      <p>
          <a href="https://zero2dataengineer.substack.com/p/day-2330-deep-dive-solutions-sql">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Day 22/30 - DEEP DIVE SOLUTIONS: SQL, PYTHON, ETL, DATA MODELING]]></title><description><![CDATA[Solutions for March 25th, 2025 Challenge &#8211; Full Breakdown + Live Runnable Code]]></description><link>https://zero2dataengineer.substack.com/p/day-2230-deep-dive-solutions-sql</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/day-2230-deep-dive-solutions-sql</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Wed, 26 Mar 2025 00:30:04 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!9Tlx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc6a4088-33fa-40d3-9d30-9c47062881e2_3582x898.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello Data Engineers,</p><p>Today&#8217;s deep dive explores <strong>nested subqueries</strong>, <strong>dictionary comprehensions</strong>, <strong>incremental ETL logic</strong>, and <strong>denormalization strategies</strong>. These concepts show up repeatedly in interviews and production systems. Let&#8217;s break them down clearly and practically.</p><p>If you haven&#8217;t <strong>upgraded</strong> yet, this is where we go beyond just knowing the answers&#8212;giving you <strong>expert breakdowns, query tuning techniques, and best practices used in production systems</strong>.</p><p><strong>Upgrade now and stay ahead of the competition!</strong></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h3><strong>SQL Deep Dive: Filtering with Nested Subqueries</strong></h3><p><strong>Challenge Recap:</strong><br>&#10067; What will be the output of the following SQL query?</p><pre><code>SELECT name  
FROM employees  
WHERE salary &gt; (  
  SELECT AVG(salary)  
  FROM employees  
);</code></pre><p>&#128280; A) Employees with the lowest salary<br>&#128280; B) All employees<br>&#128280; C) Employees earning above average salary<br>&#128280; D) SQL Error</p><p>&#9989; <strong>Answer: Option C - Employees earning above average salary</strong></p><div><hr></div><h3><strong>Why This Happens:</strong></h3><p>The subquery <code>(SELECT AVG(salary) FROM employees)</code> runs <strong>first</strong> and returns a scalar value. The outer query then filters rows where <code>salary &gt; average</code>.</p><div><hr></div><h3><strong>Where It&#8217;s Used in Real-World Applications:</strong></h3><ul><li><p>Showing top performers by comparing against global averages</p></li><li><p>Filtering rows based on rolling metrics</p></li><li><p>Salary band analysis or anomaly detection</p></li></ul>
      <p>
          <a href="https://zero2dataengineer.substack.com/p/day-2230-deep-dive-solutions-sql">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Day 22/30 SQL, Python, ETL, Data Modelling Challenge FREE Solutions 🚀]]></title><description><![CDATA[March 25th, 2025 CHALLENGE &#8211; unlock solutions + reasoning]]></description><link>https://zero2dataengineer.substack.com/p/day-2230-sql-python-etl-data-modelling</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/day-2230-sql-python-etl-data-modelling</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Wed, 26 Mar 2025 00:30:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!P4V8!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F480087e2-d585-43e3-8076-9e1282f0eb2d_200x200.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>&#128075; Hey Data Engineers!</h2><p><strong>Difficulty Level: Intermediate &#8594; Advanced</strong></p><p>We&#8217;re officially 70% through the 30-Day Challenge! Let&#8217;s dig deeper into SQL aggregations, Python tricks, efficient ETL loads, and modeling techniques.</p><p>&#128161; Understand, Don&#8217;t Memorize: &#9989; Real-world logic behind answers<br>&#9989; Optimization insights<br>&#9989; Interview-aligned learning</p><p>Today we&#8217;re working with <strong>subqueries</strong>, <strong>dictionary comprehensions</strong>, <strong>incremental ETL</strong>, and <strong>denormalization</strong> strategies. All &#128293; concepts that come up in real-world pipelines + interviews.</p><p>&#128161; <strong>Understand, Don&#8217;t Memorize:</strong><br>&#9989; Concept clarity<br>&#9989; Interview-worthy tips<br>&#9989; Scalable thinking</p><p>&#128161; Want deep dives + runnable code? Upgrade to the Annual Plan today.</p><p>&#128161; Want runnable code + deep dive breakdowns? Upgrade to the <strong>Annual Plan</strong> and supercharge your prep.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>&#128204; SQL Challenge &#8211; Average Salary Filter</h3><pre><code>SELECT name FROM employees 
WHERE salary &gt; (SELECT AVG(salary) 
FROM employees);</code></pre><p>&#10067; What will be the output?</p><p>&#128280; A) Employees with the lowest salary<br>&#128280; B) All employees<br>&#128280; C) Employees earning above average salary<br>&#128280; D) SQL Error</p><p>&#9989; <strong>Answer: C - Employees earning above average salary</strong></p><p><strong>Explanation:</strong><br>The subquery <code>(SELECT AVG(salary))</code> runs first. Then the main query filters employees whose salary exceeds that average.</p><p><strong>Best Practices:</strong><br>&#10004;&#65039; Use subqueries to calculate dynamic thresholds<br>&#10004;&#65039; Always alias when subqueries get complex<br>&#10004;&#65039; Use CTEs for better readability if reusable</p><div><hr></div><h3>&#128013; Python Challenge &#8211; Dictionary Comprehensions</h3><pre><code>nums = [1, 2, 3, 4, 5] squares = {x: x*x for x in nums if x % 2 == 0} print(squares)</code></pre><p>&#10067; Output?</p><p>&#128280; A) {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}<br>&#128280; B) {2: 4, 4: 16}<br>&#128280; C) {1: 1, 3: 9, 5: 25}<br>&#128280; D) Error</p><p>&#9989; <strong>Answer: B - {2: 4, 4: 16}</strong></p><p><strong>Explanation:</strong><br>This comprehension includes only even numbers and creates a key-value pair <code>x: x*x</code>.</p><p><strong>Best Practices:</strong><br>&#10004;&#65039; Great for filtering + transforming in one step<br>&#10004;&#65039; Prefer dict comprehensions for clean logic<br>&#10004;&#65039; Avoid overcomplicating with nested conditions</p><div><hr></div><h3>&#9889; ETL Challenge &#8211; Incremental Load Efficiency</h3><p>&#10067; What is the biggest benefit of using incremental loading in ETL?</p><p>&#128280; A) Reduces transformation logic<br>&#128280; B) Minimizes data loss<br>&#128280; C) Optimizes performance and reduces load time<br>&#128280; D) Avoids schema changes</p><p>&#9989; <strong>Answer: C - Optimizes performance and reduces load time</strong></p><p><strong>Explanation:</strong><br>Incremental loads process <em>only new or updated</em> records&#8212;saving time, resources, and cost in production pipelines.</p><p><strong>Best Practices:</strong><br>&#10004;&#65039; Use timestamps or <code>last_modified</code> columns<br>&#10004;&#65039; Add watermarks or checkpoints<br>&#10004;&#65039; Log every batch for traceability</p><div><hr></div><h3>&#129521; Data Modeling Challenge &#8211; Denormalization in Warehouses</h3><p>&#10067; In which scenario is denormalization most useful?</p><p>&#128280; A) Reducing disk storage cost<br>&#128280; B) Improving read performance for reporting<br>&#128280; C) Ensuring data consistency across OLTP systems<br>&#128280; D) Simplifying index management</p><p>&#9989; <strong>Answer: B - Improving read performance for reporting</strong></p><p><strong>Explanation:</strong><br>Denormalization speeds up read-heavy workloads by reducing joins, which is ideal for BI tools and dashboards.</p><p><strong>Best Practices:</strong><br>&#10004;&#65039; Use for OLAP systems<br>&#10004;&#65039; Monitor redundancy to avoid inconsistencies<br>&#10004;&#65039; Document logic clearly for maintainability</p><div><hr></div><h3>&#128640; Finish Strong &#8211; You&#8217;re Over 70% Done!</h3><p>&#9989; Daily challenges<br>&#9989; DE interview prep<br>&#9989; Project guides + code snippets</p><p>&#128073; <strong>Join 10K+ engineers leveling up at:</strong><br><a href="https://zero2dataengineer.substack.com">zero2dataengineer.substack.com</a></p><p>&#128172; Drop your answers &#8212; best ones get featured tomorrow! &#128293;</p>]]></content:encoded></item><item><title><![CDATA[Day 21/30 SQL, Python, ETL, Data Modelling Challenge FREE Solutions 🚀]]></title><description><![CDATA[March 24th, 2025 CHALLENGE &#8211; unlock solutions + reasoning]]></description><link>https://zero2dataengineer.substack.com/p/day-2130-sql-python-etl-data-modelling</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/day-2130-sql-python-etl-data-modelling</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Tue, 25 Mar 2025 00:35:49 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!P4V8!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F480087e2-d585-43e3-8076-9e1282f0eb2d_200x200.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>&#128075; Hey Data Engineers!</h2><p><strong>Difficulty Level: Intermediate &#8594; Advanced</strong></p><p>We&#8217;re officially 70% through the 30-Day Challenge! Let&#8217;s dig deeper into SQL aggregations, Python tricks, efficient ETL loads, and modeling techniques.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Understand, Don&#8217;t Memorize: </p><p>&#9989; Real-world logic behind answers<br>&#9989; Optimization insights<br>&#9989; Interview-aligned learning</p><p>Want runnable code + deep dive breakdowns? Upgrade to the <strong>Annual Plan</strong> and supercharge your prep.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe&quot;,&quot;text&quot;:&quot;UPGRADE TO ANNUAL MEMBERSHIP&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/subscribe"><span>UPGRADE TO ANNUAL MEMBERSHIP</span></a></p><h3>&#128204; SQL Challenge &#8211; GROUPING SETS, ROLLUP, and CUBE </h3><p>&#10067; Which SQL clause allows custom combinations of GROUP BY columns for reporting purposes?</p><p>&#128280; A) GROUP BY ROLLUP<br>&#128280; B) GROUP BY CUBE<br>&#128280; C) GROUPING SETS<br>&#128280; D) All of the above</p><p>&#9989; <strong>Answer: D - All of the above</strong></p><p><strong>Explanation:</strong><br>All options extend <code>GROUP BY</code> with richer aggregations:</p><ul><li><p><code>ROLLUP</code>: Hierarchical totals</p></li><li><p><code>CUBE</code>: All possible combinations</p></li><li><p><code>GROUPING SETS</code>: Explicit control of multiple groupings</p></li></ul><p><strong>Best Practices:</strong><br>&#10004;&#65039; Use <code>GROUPING SETS</code> for custom dashboards<br>&#10004;&#65039; Use <code>ROLLUP</code> for drill-down summaries<br>&#10004;&#65039; Analyze aggregation plans using <code>EXPLAIN</code></p><div><hr></div><h3>&#128013; Python Challenge &#8211; Running Totals with Itertools</h3><p>&#10067; Which <code>itertools</code> function returns the running totals of values in an iterable?</p><p>&#128280; A) chain()<br>&#128280; B) accumulate()<br>&#128280; C) groupby()<br>&#128280; D) permutations()</p><p>&#9989; <strong>Answer: B - accumulate()</strong></p><p><strong>Explanation:</strong><br><code>accumulate()</code> provides cumulative sums without manual loops.<br>E.g., <code>accumulate([1, 2, 3]) &#8594; [1, 3, 6]</code>.</p><p><strong>Best Practices:</strong><br>&#10004;&#65039; Use for streaming calculations<br>&#10004;&#65039; Combine with <code>operator.add</code> or custom functions<br>&#10004;&#65039; Avoid unnecessary stateful loops</p><div><hr></div><h3>&#9889; ETL Challenge &#8211; Change Data Capture (CDC)</h3><p>&#10067; Which of the following is a widely used method for real-time CDC?</p><p>&#128280; A) Full table scans<br>&#128280; B) Hash comparison<br>&#128280; C) Log-based CDC<br>&#128280; D) Duplicate audit tables</p><p>&#9989; <strong>Answer: C - Log-based CDC</strong></p><p><strong>Explanation:</strong><br>Log-based CDC reads transaction logs instead of querying full tables, enabling near real-time ETL pipelines.</p><p><strong>Best Practices:</strong><br>&#10004;&#65039; Use Debezium or Fivetran for log-based CDC<br>&#10004;&#65039; Avoid table scans for high-velocity systems<br>&#10004;&#65039; Maintain low-latency ingestion</p><div><hr></div><h3>&#129521; Data Modeling Challenge &#8211; Surrogate Keys vs Natural Keys</h3><p>&#10067; Why are surrogate keys preferred in dimensional models?</p><p>&#128280; A) Improve readability<br>&#128280; B) Avoid update issues<br>&#128280; C) Enforce constraints<br>&#128280; D) Reduce joins</p><p>&#9989; <strong>Answer: B - Avoid update issues</strong></p><p><strong>Explanation:</strong><br>Surrogate keys don&#8217;t rely on changing business data (like email/SSN), ensuring stability and referential integrity.</p><p><strong>Best Practices:</strong><br>&#10004;&#65039; Use auto-incremented surrogate keys<br>&#10004;&#65039; Avoid natural keys that might change<br>&#10004;&#65039; Ensure consistency in joins across fact/dim tables</p><div><hr></div><h3>&#128640; Ready to Level Up?</h3><p>You're almost at the finish line! Upgrade for full access to:</p><p>&#9989; Deep Dives + Live SQL/Python<br>&#9989; Real-world DE interview prep<br>&#9989; Exclusive hands-on project guides</p><p>&#128073; Join now : <strong>zero2dataengineer.substack.com</strong></p><p>&#128172; Drop your answers in the comments &#8211; top responses get a shoutout! &#128293;</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>