<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Zero2Dataengineer: The Daily Edge]]></title><description><![CDATA[One AI or Data Engineering concept every day. Clear, simple, and built for people who are just getting started and want to actually understand the technology shaping every industry right now.]]></description><link>https://zero2dataengineer.substack.com/s/zero-to-data-engineering-ai-lessons</link><image><url>https://substackcdn.com/image/fetch/$s_!P4V8!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F480087e2-d585-43e3-8076-9e1282f0eb2d_200x200.png</url><title>Zero2Dataengineer: The Daily Edge</title><link>https://zero2dataengineer.substack.com/s/zero-to-data-engineering-ai-lessons</link></image><generator>Substack</generator><lastBuildDate>Tue, 21 Apr 2026 19:49:25 GMT</lastBuildDate><atom:link href="https://zero2dataengineer.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Avantika]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[zero2dataengineer@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[zero2dataengineer@substack.com]]></itunes:email><itunes:name><![CDATA[Avantika_Penumarty]]></itunes:name></itunes:owner><itunes:author><![CDATA[Avantika_Penumarty]]></itunes:author><googleplay:owner><![CDATA[zero2dataengineer@substack.com]]></googleplay:owner><googleplay:email><![CDATA[zero2dataengineer@substack.com]]></googleplay:email><googleplay:author><![CDATA[Avantika_Penumarty]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Why Facebook Never Goes Down: The Two Systems Behind 3 Billion Users]]></title><description><![CDATA[Most engineers never study LogDevice and RocksDB. Here is why they should.]]></description><link>https://zero2dataengineer.substack.com/p/why-facebook-never-goes-down-the</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/why-facebook-never-goes-down-the</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Thu, 02 Apr 2026 00:30:56 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!P4V8!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F480087e2-d585-43e3-8076-9e1282f0eb2d_200x200.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p style="text-align: justify;">Every time you open Facebook, something extraordinary happens behind the scenes.</p><p style="text-align: justify;">Three billion people are doing the exact same thing at the same time. Posting. Scrolling. Messaging. Watching. And somehow the whole thing just works. No crashes. No waiting. No downtime.</p><p style="text-align: justify;">The answer is not more servers. It is not a bigger database. It is a fundamental design decision that most engineers never think about until their system is already on fire.</p><p style="text-align: justify;">Meta treats reads and writes as two completely separate problems.</p><p style="text-align: justify;">In most systems, reads and writes share the same path. The same database handles both. Which means when one gets busy, the other suffers. Your users are trying to load their feed while your pipeline is ingesting millions of new events at the same time. They are fighting over the same resources. And under load, everybody loses.</p><p style="text-align: justify;">Most engineers respond to this by adding memory, scaling horizontally, or upgrading their database tier. None of that fixes the actual problem. Because the actual problem is architectural, not operational.</p><p style="text-align: justify;">Meta solved it by building two completely different systems from scratch.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p style="text-align: justify;"><strong>LOGDEVICE: BUILT FOR WRITES ONLY</strong></p><p style="text-align: justify;">LogDevice is Meta&#8217;s distributed log storage system. It was designed with one purpose: ingest data at massive speed without ever slowing down.</p><p style="text-align: justify;">Every like, every message, every video view, every backend sensor ping across three billion users - LogDevice takes it all in. It uses a log-structured approach which means it writes data sequentially rather than randomly jumping around the disk. Sequential writes are dramatically faster than random writes. That is not an accident. That is a deliberate design choice made specifically to maximize write throughput.</p><p style="text-align: justify;">LogDevice does not care about reads. It was never designed to serve reads efficiently. That is the point. By giving up on reads entirely, it becomes extraordinarily good at the one thing it was built for.</p><p style="text-align: justify;">Most engineers designing their first production system try to pick one database that handles everything. LogDevice is the proof that this instinct, while understandable, is wrong at scale.</p><p style="text-align: justify;"><strong>ROCKSDB: BUILT FOR READS WITH SURGICAL PRECISION</strong></p><p style="text-align: justify;">RocksDB started as Google&#8217;s LevelDB. Meta took it, rebuilt it, and open sourced it in 2013. Today it powers systems at Facebook, LinkedIn, Yahoo, Twitter, and hundreds of other companies running at scale.</p><p style="text-align: justify;">The reason Meta built RocksDB instead of using an existing solution is the same reason they built LogDevice. Nothing on the market gave them the control they needed.</p><p style="text-align: justify;">RocksDB is an embeddable key-value store that lets you tune read and write performance independently at the instance level. This is the part most engineers miss.</p><p style="text-align: justify;">You can deploy one RocksDB instance configured entirely for fast point lookups optimized for the read patterns of a news feed where you need to retrieve a specific user&#8217;s data in milliseconds. You deploy another instance configured for high write throughput - optimized for the ingestion patterns of an analytics pipeline processing billions of events. Same underlying technology. Completely different configurations. Completely different jobs.</p><p style="text-align: justify;">They never compete for the same resources because they were never meant to run the same workload.</p><p style="text-align: justify;">RocksDB also uses a data structure called an LSM tree -  Log Structured Merge tree which batches writes in memory and flushes them to disk in sorted order. This makes writes fast and keeps related data physically close together on disk. When you request data, the disk has to seek less to find it. Less seeking means faster reads. Meta takes this even further by pre-arranging the most frequently accessed bytes so they are physically adjacent on disk. The result is a feed that loads in milliseconds regardless of how many people are using it simultaneously.</p><p style="text-align: justify;"><strong>WHY THIS MATTERS FOR YOUR SYSTEM RIGHT NOW</strong></p><p>You are probably not building for three billion users. But the principle applies at every scale.</p><p>If you have a system that slows down under load, the first question to ask is not what hardware do I need. The question is are my reads and writes competing for the same resources.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/p/why-facebook-never-goes-down-the/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/p/why-facebook-never-goes-down-the/comments"><span>Leave a comment</span></a></p><p>I have seen this exact problem at companies with 50 engineers and companies with 5,000. A shared database handling both analytical queries and transactional writes. A single Kafka consumer group processing both real-time and batch workloads. One pipeline serving five different use cases with completely different performance requirements.</p><p>The symptom is always the same. Things work fine until load increases. Then everything degrades together because everything is coupled together.</p><p>The fix is always the same too. Separate the concern. Define the job. Build for that job specifically.</p><p>LogDevice does not try to be RocksDB. RocksDB does not try to be LogDevice. And Facebook never goes down.</p><p>Here is the three step framework I apply before designing any new data system:</p><p>Step one. Write down every read pattern your system needs to support. How frequently. What latency is acceptable. What the data shape looks like.</p><p>Step two. Write down every write pattern separately. How much volume. How fast does it need to land. What consistency guarantees do you need.</p><p>Step three. Ask honestly whether one system can serve both patterns without compromising either. If the answer is no, you already know what to do.</p><p>Separate the concern first. Then optimize. That is how you build something that survives contact with real traffic.</p><p>If you found this valuable, Thursday&#8217;s paid newsletter goes even deeper.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe&quot;,&quot;text&quot;:&quot;Not subscribed yet? Now is a good time.&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/subscribe"><span>Not subscribed yet? Now is a good time.</span></a></p><p>I am breaking down the exact career moves that separate engineers who understand systems from engineers who just operate them. The difference in compensation between those two groups at companies like Meta is not small.</p><p>Thursday 5:30pm. Paid subscribers only.</p><p>See you Thursday.</p><p>&#8212; Avantika</p>]]></content:encoded></item><item><title><![CDATA[Everyone Talks About Spark. SQL Still Runs the Data World.]]></title><description><![CDATA[How modern data pipelines actually move and scale]]></description><link>https://zero2dataengineer.substack.com/p/sql-in-data-engineering-2026</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/sql-in-data-engineering-2026</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Thu, 15 Jan 2026 13:15:23 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!fJvW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7396fed-3677-4b5b-9129-93f280a4d81a_2742x1500.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><p>Hi everyone,</p><p>Before we begin, I want to share a quick, honest note and sincere apologies to all my readers. </p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>I&#8217;ve been quieter than usual over the last few months. I was dealing with some personal and health-related challenges and needed to step back briefly. Thank you for your patience, messages, and continued support it truly means more than you know.</p><p>I&#8217;m back now, and it felt right to restart with a topic that sits at the very core of data engineering one that has quietly shaped almost every system I&#8217;ve worked on.</p><h2>Table of Contents</h2><ul><li><p>What is SQL in Data Engineering?</p></li><li><p>Why SQL is Crucial for Data Engineering</p></li><li><p>SQL for ETL vs. ELT Pipelines</p></li><li><p>Essential SQL for Data Engineers</p></li><li><p>SQL in Modern Data Engineering Tools</p></li><li><p>Best Practices for Writing SQL in Data Engineering</p></li><li><p>Future of SQL in Data Engineering</p></li><li><p>Conclusion</p></li><li><p>FAQs</p></li></ul><div><hr></div><h2>Prefer listening over reading?</h2><p>Are you on your way to work or heading back home?<br>Starting a run, folding laundry, or just taking a quiet break?</p><p>If reading feels like too much right now, I&#8217;ve got you.</p><p>I recorded an audio version of this newsletter so you can listen while you move through your day. Same ideas, same depth just in a format that fits real life.</p><p>Plug in your headphones, press play, and let SQL make sense in the background while you take care of everything else.</p><div class="native-audio-embed" data-component-name="AudioPlaceholder" data-attrs="{&quot;label&quot;:null,&quot;mediaUploadId&quot;:&quot;189282ce-5be2-43b1-b333-eab1075954c4&quot;,&quot;duration&quot;:271.64734,&quot;downloadable&quot;:false,&quot;isEditorNode&quot;:true}"></div><div><hr></div><h2>Introduction</h2><p>Structured Query Language (SQL) remains the foundation of data engineering, enabling data professionals to design, build, and maintain scalable data pipelines. Despite the rise of modern technologies like Apache Spark and NoSQL databases, SQL&#8217;s declarative syntax and universal adoption make it indispensable in real-world data engineering workflows.</p><p>In this piece, I&#8217;ll walk you through how SQL shows up in real data engineering work, what actually matters in practice, and why it continues to be one of the most valuable skills you can invest in as a data engineer.</p><h2>What is SQL in Data Engineering?</h2><p><em>Meta story:</em> Early in my career, I believed mastering tools would make me a great data engineer. Spark, Airflow, Kafka I chased them all. What actually made my work reliable wasn&#8217;t a tool. It was the moment I truly understood SQL as a way of thinking: describing <em>what</em> the data should look like, not <em>how</em> to move every row. That shift changed how I designed pipelines forever.</p><p>At its core, SQL (Structured Query Language) is the language we use to talk to data stored in relational systems to ask questions, shape answers, and turn raw records into something meaningful.</p><p>In data engineering, SQL is used to:</p><ul><li><p>Ingest raw data</p></li><li><p>Clean and validate datasets</p></li><li><p>Transform data into analytics-ready models</p></li><li><p>Load data into warehouses and lakes</p></li></ul><p>SQL acts as the linchpin of both ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) pipelines, making it the backbone of modern data platforms.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fJvW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7396fed-3677-4b5b-9129-93f280a4d81a_2742x1500.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fJvW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7396fed-3677-4b5b-9129-93f280a4d81a_2742x1500.png 424w, https://substackcdn.com/image/fetch/$s_!fJvW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7396fed-3677-4b5b-9129-93f280a4d81a_2742x1500.png 848w, https://substackcdn.com/image/fetch/$s_!fJvW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7396fed-3677-4b5b-9129-93f280a4d81a_2742x1500.png 1272w, https://substackcdn.com/image/fetch/$s_!fJvW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7396fed-3677-4b5b-9129-93f280a4d81a_2742x1500.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fJvW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7396fed-3677-4b5b-9129-93f280a4d81a_2742x1500.png" width="2742" height="1500" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c7396fed-3677-4b5b-9129-93f280a4d81a_2742x1500.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0bc234bb-99ac-4f12-9c09-867c104ebce5_2742x1500.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1500,&quot;width&quot;:2742,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:8159196,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://zero2dataengineer.substack.com/i/184632503?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbec5eb62-8686-48cd-a88f-bbf11fdc3997_2752x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fJvW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7396fed-3677-4b5b-9129-93f280a4d81a_2742x1500.png 424w, https://substackcdn.com/image/fetch/$s_!fJvW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7396fed-3677-4b5b-9129-93f280a4d81a_2742x1500.png 848w, https://substackcdn.com/image/fetch/$s_!fJvW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7396fed-3677-4b5b-9129-93f280a4d81a_2742x1500.png 1272w, https://substackcdn.com/image/fetch/$s_!fJvW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7396fed-3677-4b5b-9129-93f280a4d81a_2742x1500.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Why SQL is Crucial for Data Engineering</h2><p><em>Project moment:</em> On one production pipeline, we had Python transformations, custom logic, and retries everywhere and still broke SLAs weekly. The fix wasn&#8217;t a rewrite. It was replacing fragmented logic with clear, well-structured SQL. Fewer lines. Fewer bugs. More trust.</p><h3>1. Data Extraction</h3><p>SQL makes it surprisingly easy to pull data from structured systems like PostgreSQL, MySQL, and Oracle, and even from modern platforms that support SQL-style querying such as BigQuery and Redshift.</p><h3>2. Data Transformation</h3><p>Data engineers rely on SQL for cleansing, aggregation, and normalization. Features like Common Table Expressions (CTEs), window functions, and subqueries allow complex transformations to remain readable and maintainable.</p><h3>3. Data Loading</h3><p>In practice, SQL-powered pipelines move data into warehouses and lakes in a way that keeps analytics teams productive and downstream systems stable.</p><h3>4. Data Integration</h3><p>By joining datasets across multiple systems, SQL helps engineers create unified data models that power reliable reporting and decision-making.</p><h3>5. Performance Optimization</h3><p>Modern SQL engines such as Apache Hive, Presto, and Spark SQL provide query optimization capabilities that reduce execution time, improve resource utilization, and scale analytics workloads.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!snTI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58603c05-d225-498b-88d6-c1ac6f9b9754_1340x1782.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!snTI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58603c05-d225-498b-88d6-c1ac6f9b9754_1340x1782.png 424w, https://substackcdn.com/image/fetch/$s_!snTI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58603c05-d225-498b-88d6-c1ac6f9b9754_1340x1782.png 848w, https://substackcdn.com/image/fetch/$s_!snTI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58603c05-d225-498b-88d6-c1ac6f9b9754_1340x1782.png 1272w, https://substackcdn.com/image/fetch/$s_!snTI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58603c05-d225-498b-88d6-c1ac6f9b9754_1340x1782.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!snTI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58603c05-d225-498b-88d6-c1ac6f9b9754_1340x1782.png" width="1340" height="1782" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/58603c05-d225-498b-88d6-c1ac6f9b9754_1340x1782.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/84e87705-07fa-4fbc-90ee-06144368ba3f_1340x1782.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1782,&quot;width&quot;:1340,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:685127,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://zero2dataengineer.substack.com/i/184632503?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84e87705-07fa-4fbc-90ee-06144368ba3f_1340x1782.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!snTI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58603c05-d225-498b-88d6-c1ac6f9b9754_1340x1782.png 424w, https://substackcdn.com/image/fetch/$s_!snTI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58603c05-d225-498b-88d6-c1ac6f9b9754_1340x1782.png 848w, https://substackcdn.com/image/fetch/$s_!snTI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58603c05-d225-498b-88d6-c1ac6f9b9754_1340x1782.png 1272w, https://substackcdn.com/image/fetch/$s_!snTI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58603c05-d225-498b-88d6-c1ac6f9b9754_1340x1782.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>SQL for ETL vs. ELT Pipelines</h2><p><em>Career insight:</em> I&#8217;ve seen engineers struggle not because they chose ETL or ELT, but because they didn&#8217;t understand <em>where</em> SQL belongs in the system. Once you see SQL as a first-class layer not an afterthought architectural decisions become simpler.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vf17!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980b19bd-e21c-4664-9f10-6f2aaa88171e_1542x546.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vf17!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980b19bd-e21c-4664-9f10-6f2aaa88171e_1542x546.png 424w, https://substackcdn.com/image/fetch/$s_!vf17!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980b19bd-e21c-4664-9f10-6f2aaa88171e_1542x546.png 848w, https://substackcdn.com/image/fetch/$s_!vf17!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980b19bd-e21c-4664-9f10-6f2aaa88171e_1542x546.png 1272w, https://substackcdn.com/image/fetch/$s_!vf17!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980b19bd-e21c-4664-9f10-6f2aaa88171e_1542x546.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vf17!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980b19bd-e21c-4664-9f10-6f2aaa88171e_1542x546.png" width="690" height="244.31906614785993" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/980b19bd-e21c-4664-9f10-6f2aaa88171e_1542x546.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/10752722-3ab0-4f88-bc0e-ca32e4681069_1542x546.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:546,&quot;width&quot;:1542,&quot;resizeWidth&quot;:690,&quot;bytes&quot;:147665,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://zero2dataengineer.substack.com/i/184632503?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10752722-3ab0-4f88-bc0e-ca32e4681069_1542x546.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vf17!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980b19bd-e21c-4664-9f10-6f2aaa88171e_1542x546.png 424w, https://substackcdn.com/image/fetch/$s_!vf17!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980b19bd-e21c-4664-9f10-6f2aaa88171e_1542x546.png 848w, https://substackcdn.com/image/fetch/$s_!vf17!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980b19bd-e21c-4664-9f10-6f2aaa88171e_1542x546.png 1272w, https://substackcdn.com/image/fetch/$s_!vf17!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980b19bd-e21c-4664-9f10-6f2aaa88171e_1542x546.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>SQL sits at the center of both approaches. In recent years, ELT has become more common simply because cloud warehouses make large-scale transformations easier and cheaper to run in parallel.</p><h2>Essential SQL for Data Engineers</h2><p><em>Interview reality:</em> Almost every senior data engineering interview I&#8217;ve seen comes down to this section. Not syntax trivia but whether you can express business logic clearly, safely, and efficiently in SQL.</p><h3>1. Window Functions</h3><p>These are used when you need running totals, rankings, or comparisons across groups without losing row-level detail.</p><pre><code>SELECT
customer_id,
order_date,
SUM(order_amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS cumulative_sales
FROM orders;</code></pre><h3>2. Common Table Expressions (CTEs)</h3><p>CTEs make complex logic easier to read, reason about, and safely modify over time.</p><pre><code>WITH recent_orders AS (
SELECT order_id, customer_id, order_date
FROM orders
WHERE order_date &gt; '2026-01-01'
)
SELECT * FROM recent_orders;</code></pre><h3>3. Joins</h3><p>Most real-world datasets only make sense once multiple tables are joined together this is where SQL earns its keep.</p><pre><code>SELECT customers.name, orders.order_id
FROM customers
JOIN orders
ON customers.customer_id = orders.customer_id;</code></pre><h3>4. Indexes and Query Optimization</h3><p>Indexes improve read performance, while query planners and EXPLAIN statements help identify bottlenecks in large-scale systems.</p><h3>5. Data Partitioning</h3><p>Partitioning large tables improves performance in distributed systems such as Hive and BigQuery by limiting the amount of data scanned.</p><h2>SQL in Modern Data Engineering Tools</h2><p><em>Meta observation:</em> Tools change faster than job titles. What stays constant is SQL acting as the common language across platforms the one skill that transfers cleanly when stacks evolve.</p><p>ToolPurposeSQL RoleApache HiveData warehousing on HadoopHiveQL for querying HDFSApache Spark SQLDistributed data processingSQL on DataFramesGoogle BigQueryServerless data warehouseStandard SQLAWS RedshiftCloud data warehousePostgreSQL-like SQLSnowflakeCloud data platformANSI SQLdbtData transformationSQL-based modeling</p><h2>Best Practices for Writing SQL in Data Engineering</h2><p><em>Production lesson:</em> If someone else can&#8217;t understand your SQL six months later including you it will eventually cost time, trust, or money. Readability is not optional in production systems.</p><ol><li><p><strong>Use CTEs for complex logic</strong><br>Break queries into logical steps to improve readability and maintainability.</p></li><li><p><strong>Avoid </strong><code>SELECT *</code><br>Explicitly select required columns to reduce data scanning and improve performance.</p></li><li><p><strong>Leverage indexes and partitioning</strong><br>Use clustering, partition keys, and indexes to optimize large datasets.</p></li><li><p><strong>Monitor query performance</strong><br>Analyze execution plans using EXPLAIN statements to identify inefficiencies.</p></li><li><p><strong>Follow data governance standards</strong><br>Ensure compliance with organizational policies around data security, privacy, and access control.</p></li></ol><h2>Future of SQL in Data Engineering</h2><p><em>Forward-looking thought:</em> SQL isn&#8217;t competing with new paradigms it&#8217;s absorbing them. Streaming, federated queries, and data mesh architectures are all bending toward SQL as the shared interface.</p><p>Despite the growth of NoSQL and distributed systems, SQL&#8217;s declarative nature ensures its continued relevance.</p><p>Key trends shaping the future:</p><ul><li><p><strong>SQL on streaming data</strong> using platforms like Apache Flink and ksqlDB</p></li><li><p><strong>Federated queries</strong> enabling cross-platform data access</p></li><li><p><strong>SQL in data mesh architectures</strong> as a shared querying layer across decentralized domains</p></li></ul><p>SQL is not being replaced it is evolving alongside modern data architectures.</p><h2>Conclusion</h2><p><em>Closing reflection:</em> Every time data systems fail, the root cause is rarely &#8220;bad data.&#8221; It&#8217;s unclear logic. SQL, when written well, makes intent explicit and that&#8217;s why it continues to matter.</p><p>SQL is far more than just a querying language. It is the backbone of data engineering, powering data ingestion, transformation, integration, and analytics at scale.</p><p>As tools and platforms evolve, SQL&#8217;s clarity, expressiveness, and adaptability ensure it remains an essential skill for data engineers, data scientists, and analytics professionals.</p><h2>FAQs</h2><p><strong>Is SQL used in data engineering?</strong><br>Yes. SQL is fundamental to data engineering and is used extensively for data extraction, transformation, loading, validation, and modeling.</p><p><strong>How do I become a SQL data engineer?</strong><br>Build a strong foundation in SQL and database systems, practice query optimization, learn data modeling, and gain hands-on experience with modern cloud data warehouses. Complement SQL with Python for automation and orchestration.</p><p><strong>Is SQL still relevant in 2026?</strong><br>Absolutely. SQL remains one of the most in-demand skills due to its deep integration with cloud platforms, analytics tools, and modern data stacks.</p><p><strong>Is Python and SQL enough for data engineering?</strong><br>They form a strong foundation, but data engineers also benefit from learning data orchestration tools, distributed systems, and cloud platforms.</p><p><strong>Should data engineers know SQL?</strong><br>Yes. SQL is essential for building reliable data pipelines, modeling data, and ensuring data quality.</p><p><strong>What are some of the best SQL courses for data engineers?</strong></p><ul><li><p>PostgreSQL for Everybody (Coursera)</p></li><li><p>SQL Fundamentals (Dataquest)</p></li><li><p>The Ultimate MySQL Bootcamp (Udemy)</p></li><li><p>Complete SQL Mastery (CodeWithMosh)</p></li><li><p>Advanced SQL for Data Engineering (Udemy)</p></li></ul><div><hr></div><div class="poll-embed" data-attrs="{&quot;id&quot;:433651}" data-component-name="PollToDOM"></div><p>                                     &#128279; <strong>Follow <a href="https://www.linkedin.com/in/avantikkapenumarty/">Avantikka Penumarty</a> on LinkedIn</strong></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>&#128274; <strong>Want to turn this into real skill?</strong><br>Access interactive SQL flashcards, quizzes, and AI-powered explanations in the paid section below.</p><h2>Take This From Insight to Instinct</h2><p>Reading builds understanding.<br>Practice builds confidence.</p><p>If SQL plays a role in your job, interviews, or long-term growth, passive reading isn&#8217;t enough.</p><p>I&#8217;ve created an <strong>interactive learning extension</strong> to this newsletter designed to help you <em>think in SQL</em>, not just recognize syntax.</p><p><strong>Inside the paid section, you&#8217;ll get:</strong></p><ul><li><p>&#129504; <strong>Flashcards</strong> to reinforce core SQL concepts and mental models</p></li><li><p>&#129514; <strong>Scenario-based quizzes</strong> that mirror real data engineering decisions</p></li><li><p>&#129302; <strong>AI-powered explanations</strong> that walk you through the <em>why</em>, not just the answer</p></li><li><p>&#127959;&#65039; <strong>Applied reasoning</strong> you can reuse in production systems and interviews</p></li></ul><p>This is how you move from:</p><blockquote><p>&#8220;I&#8217;ve read this&#8221;<br>to<br>&#8220;I can apply this under pressure.&#8221;</p></blockquote>
      <p>
          <a href="https://zero2dataengineer.substack.com/p/sql-in-data-engineering-2026">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Prefect vs Airflow vs Dagster]]></title><description><![CDATA[What You Pick Tells Me Everything About How You Think]]></description><link>https://zero2dataengineer.substack.com/p/prefect-vs-airflow-vs-dagster</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/prefect-vs-airflow-vs-dagster</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Sat, 24 May 2025 00:30:00 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/5f906a77-1248-4735-a8b1-7acd36c91db3_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Most engineers choose orchestration tools like they&#8217;re picking a side in a debate.</p><p>&#8220;Airflow is legacy!&#8221;<br>&#8220;Dagster is the future!&#8221;<br>&#8220;Prefect is so clean!&#8221;</p><p>But you&#8217;re not here for debate. You&#8217;re here to <strong>ship pipelines, learn fast, and get hired</strong>.</p><p>So let&#8217;s get real:<br>Which tool should <em>you</em> pick &#8212; for your project, for your portfolio, or for your resume?</p><p>This guide won&#8217;t compare every feature.<br>It&#8217;ll show you how to <strong>think like a system builder</strong>, not a tool fangirl.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>1. When to Use <strong>Airflow</strong></h3><p>Use Airflow when:</p><ul><li><p>You&#8217;re applying to big tech or mature data teams</p></li><li><p>You want to show you understand orchestration, retries, and scheduling</p></li><li><p>You need strong scheduling and visibility (UI, logs, retries)</p></li></ul><p>Why it works for your <strong>resume</strong>:</p><blockquote><p>&#8220;Familiar with industry-standard orchestration using Airflow, including DAG design, sensors, retry logic, and alerting.&#8221;</p></blockquote><p>Why it&#8217;s great for <strong>teaching yourself the fundamentals</strong>:<br>Airflow makes you learn how pipelines <strong>actually</strong> run: tasks, triggers, failures, dependencies. It exposes orchestration in raw form.</p><p>Where it&#8217;s weaker:</p><ul><li><p>Not great for data scientists or notebooks</p></li><li><p>Heavier to deploy without Docker</p></li><li><p>Harder to reason about with dynamic workflows</p></li></ul><div><hr></div><h3>2. When to Use <strong>Prefect</strong></h3><p>Use Prefect when:</p><ul><li><p>You&#8217;re building a lightweight project</p></li><li><p>You don&#8217;t want to mess with Airflow configs</p></li><li><p>You like writing clean Python with decorators</p></li></ul><p>Why it works for <strong>solo projects</strong>:</p><ul><li><p>It&#8217;s Pythonic and elegant</p></li><li><p>Easy to get up and running</p></li><li><p>Great docs, fast dev feedback loop</p></li></ul><p>Why it works for <strong>data science/analytics engineers</strong>:</p><ul><li><p>You can orchestrate model training, dbt, API tasks quickly</p></li><li><p>You don&#8217;t need to explain DAGs to non-engineers</p></li></ul><p>Where it shines on a resume:</p><blockquote><p>&#8220;Used Prefect to orchestrate model training and dbt transformations in a low-latency ML workflow.&#8221;</p></blockquote><p>Where it&#8217;s weaker:</p><ul><li><p>Less recognized by recruiters</p></li><li><p>Smaller community</p></li><li><p>Not ideal for heavy-duty, multi-team orchestration</p></li></ul><div><hr></div><h3>3. When to Use <strong>Dagster</strong></h3><p>Use Dagster when:</p><ul><li><p>You care about type safety, IO contracts, and observability</p></li><li><p>You want to model your data pipeline like software</p></li><li><p>You&#8217;re building a <strong>data platform</strong>, not just a DAG</p></li></ul><p>What makes Dagster interesting:</p><ul><li><p>Built-in concepts like assets, software-defined pipelines</p></li><li><p>First-class support for data lineage and testing</p></li><li><p>Powerful for collaborative teams that want <em>engineer-level</em> visibility</p></li></ul><p>On your resume:</p><blockquote><p>&#8220;Designed asset-aware DAGs in Dagster to enforce data lineage, retry logic, and schema-aware contracts.&#8221;</p></blockquote><p>Where Dagster may not be right:</p><ul><li><p>Overkill for simple projects</p></li><li><p>Less intuitive for beginners</p></li><li><p>Still evolving rapidly &#8212; may have rough edges</p></li></ul><div><hr></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!585-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd951a63c-3fec-4309-aaba-70c9cebf6b1a_1666x732.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!585-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd951a63c-3fec-4309-aaba-70c9cebf6b1a_1666x732.png 424w, https://substackcdn.com/image/fetch/$s_!585-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd951a63c-3fec-4309-aaba-70c9cebf6b1a_1666x732.png 848w, https://substackcdn.com/image/fetch/$s_!585-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd951a63c-3fec-4309-aaba-70c9cebf6b1a_1666x732.png 1272w, https://substackcdn.com/image/fetch/$s_!585-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd951a63c-3fec-4309-aaba-70c9cebf6b1a_1666x732.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!585-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd951a63c-3fec-4309-aaba-70c9cebf6b1a_1666x732.png" width="1456" height="640" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d951a63c-3fec-4309-aaba-70c9cebf6b1a_1666x732.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:640,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:97696,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://zero2dataengineer.substack.com/i/164423537?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd951a63c-3fec-4309-aaba-70c9cebf6b1a_1666x732.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!585-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd951a63c-3fec-4309-aaba-70c9cebf6b1a_1666x732.png 424w, https://substackcdn.com/image/fetch/$s_!585-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd951a63c-3fec-4309-aaba-70c9cebf6b1a_1666x732.png 848w, https://substackcdn.com/image/fetch/$s_!585-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd951a63c-3fec-4309-aaba-70c9cebf6b1a_1666x732.png 1272w, https://substackcdn.com/image/fetch/$s_!585-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd951a63c-3fec-4309-aaba-70c9cebf6b1a_1666x732.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h3>What Your Tool Choice Says About You</h3><p>If you picked...You&#8217;re signaling...<strong>Airflow</strong>&#8220;I understand production-scale systems.&#8221;<strong>Prefect</strong>&#8220;I move fast and iterate cleanly in Python.&#8221;<strong>Dagster</strong>&#8220;I think in contracts, lineage, and scale.&#8221;</p><p>There is no &#8220;best&#8221; tool &#8212; only the one that aligns with:</p><ul><li><p><strong>The story you want to tell</strong></p></li><li><p><strong>The type of work you want to do</strong></p></li><li><p><strong>The types of teams you want to join</strong></p></li></ul><div><hr></div><h3>The Hidden Skill Behind Tool Choice: Narrative Alignment</h3><p>The tool you use reflects how you <strong>frame problems</strong>.</p><p>Interviewers aren&#8217;t just looking at which orchestrator you picked &#8212; they&#8217;re evaluating whether your <strong>mental model</strong> matches the role.</p><ul><li><p>Use <strong>Airflow</strong> if you want to tell a story about <strong>scale, resilience, and legacy integration</strong></p></li><li><p>Use <strong>Prefect</strong> if you want to show you&#8217;re <strong>experiment-driven, lean, and fast-moving</strong></p></li><li><p>Use <strong>Dagster</strong> if you want to prove you're a <strong>system thinker who designs for complexity and traceability</strong></p></li></ul><p>This is less about syntax, more about <strong>signal</strong>. Make your choice reflect your <strong>intended audience</strong>.</p><div><hr></div><h3>What Top Companies Use (And Why It Matters)</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dtvB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14bed34-f7c4-4cd5-bb08-73222295b077_1630x536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dtvB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14bed34-f7c4-4cd5-bb08-73222295b077_1630x536.png 424w, https://substackcdn.com/image/fetch/$s_!dtvB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14bed34-f7c4-4cd5-bb08-73222295b077_1630x536.png 848w, https://substackcdn.com/image/fetch/$s_!dtvB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14bed34-f7c4-4cd5-bb08-73222295b077_1630x536.png 1272w, https://substackcdn.com/image/fetch/$s_!dtvB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14bed34-f7c4-4cd5-bb08-73222295b077_1630x536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dtvB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14bed34-f7c4-4cd5-bb08-73222295b077_1630x536.png" width="1456" height="479" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a14bed34-f7c4-4cd5-bb08-73222295b077_1630x536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:479,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:88305,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://zero2dataengineer.substack.com/i/164423537?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14bed34-f7c4-4cd5-bb08-73222295b077_1630x536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dtvB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14bed34-f7c4-4cd5-bb08-73222295b077_1630x536.png 424w, https://substackcdn.com/image/fetch/$s_!dtvB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14bed34-f7c4-4cd5-bb08-73222295b077_1630x536.png 848w, https://substackcdn.com/image/fetch/$s_!dtvB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14bed34-f7c4-4cd5-bb08-73222295b077_1630x536.png 1272w, https://substackcdn.com/image/fetch/$s_!dtvB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14bed34-f7c4-4cd5-bb08-73222295b077_1630x536.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>If you're applying somewhere &#8212; match your project tool with what that company runs in production (or wants to).</em></p><div><hr></div><h3>How to Frame This on Your Resume</h3><p>Don&#8217;t write:</p><blockquote><p>&#8220;Used Airflow for DAG orchestration.&#8221;</p></blockquote><p>Write:</p><blockquote><p>&#8220;Designed idempotent, retry-aware data pipelines using Airflow for scalable ETL orchestration across S3, Spark, and Snowflake &#8212; with SLA tracking and alerting.&#8221;</p></blockquote><p>Or:</p><blockquote><p>&#8220;Built fast, modular Prefect flows to orchestrate data science pipelines with resume-to-dashboard visibility in under 15 minutes per run.&#8221;</p></blockquote><p>Or:</p><blockquote><p>&#8220;Architected lineage-aware pipelines using Dagster assets and config mapping to reduce failure recovery time by 80% across multiple data teams.&#8221;</p></blockquote><p><strong>The tool isn&#8217;t the flex. The system design behind it is.</strong></p><div><hr></div><h3>Thinking Like a Hiring Manager</h3><p>If I&#8217;m hiring:</p><ul><li><p><strong>Airflow</strong> tells me you&#8217;ve worked with mature pipelines and understand operational burden.</p></li><li><p><strong>Prefect</strong> tells me you can build fast, are probably solo or hybrid (data + product), and can ship.</p></li><li><p><strong>Dagster</strong> tells me you write pipelines like software &#8212; clean, testable, typed &#8212; and are comfortable with architecture decisions.</p></li></ul><p>If you&#8217;re junior? Pick one and go deep.<br>If you&#8217;re senior? Know when to choose which &#8212; and explain it.</p><div><hr></div><h3>Final Advice</h3><p>Don&#8217;t just pick the tool that&#8217;s hyped.<br>Pick the one that <strong>matches how you want to think and explain your work</strong>.</p><p>What you build is one thing.<br>How you <em>talk about it</em> is what gets you hired.</p><div><hr></div>]]></content:encoded></item><item><title><![CDATA[Airflow Best Practices — What to Avoid in Projects & Interviews]]></title><description><![CDATA[From &#8216;It Worked on My Machine&#8217; to &#8216;It Survives in Prod&#8217;]]></description><link>https://zero2dataengineer.substack.com/p/airflow-best-practices-what-to-avoid</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/airflow-best-practices-what-to-avoid</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Fri, 23 May 2025 00:30:22 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/cb9d4320-3c31-4338-aa6a-262c6e9a236f_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Most people say,</p><blockquote><p>&#8220;Yeah, I&#8217;ve used Airflow.&#8221;</p></blockquote><p>But interviewers can tell <em>within 60 seconds</em> whether you actually understand it &#8212; or just ran someone else&#8217;s DAG.</p><p>This post is not a checklist of features.<br>It&#8217;s a breakdown of <strong>the top Airflow mistakes that break pipelines, burn teams, and ruin interviews</strong> &#8212; and how to avoid them.</p><p>Let&#8217;s make you sound like someone who&#8217;s deployed DAGs in production, not just built toy examples.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>Mistake #1: Treating Airflow Like a Script Runner</h3><p>If your tasks are running huge pandas transformations, calling 15 APIs, and returning massive objects between tasks, you're doing it wrong.</p><p>Airflow is not Spark. Not dbt. Not a transformation engine. It&#8217;s the <strong>orchestrator</strong> &#8212; the conductor of the data workflow.</p><p><strong>Do this instead:</strong></p><ul><li><p>Keep tasks modular</p></li><li><p>Push heavy lifting into external jobs (Spark, SQL, cloud ETL)</p></li><li><p>Use Airflow to manage dependencies, not business logic</p></li></ul><p><strong>How to say this in interviews:</strong></p><blockquote><p>&#8220;I used Airflow to orchestrate pipeline steps, but offloaded heavy data processing to Snowflake and Spark jobs to keep the DAGs lean and observable.&#8221;</p></blockquote><div><hr></div><h3>Mistake #2: Ignoring Retry Logic and Failure Handling</h3><p>Most junior engineers write DAGs that work&#8230; when nothing goes wrong.</p><p>But in production:</p><ul><li><p>APIs time out</p></li><li><p>S3 files get delayed</p></li><li><p>Database connections drop</p></li></ul><p><strong>Best practices:</strong></p><ul><li><p>Always configure <code>retries</code>, <code>retry_delay</code>, and <code>on_failure_callback</code></p></li><li><p>Log why each task failed (don&#8217;t just rely on the default log dump)</p></li><li><p>Use idempotent task design &#8212; so retries don&#8217;t break downstream logic</p></li></ul><p><strong>Pro-level tip:</strong> Add exponential backoff and SLA alerts.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe&quot;,&quot;text&quot;:&quot;Upgrade to Annual&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/subscribe"><span>Upgrade to Annual</span></a></p>
      <p>
          <a href="https://zero2dataengineer.substack.com/p/airflow-best-practices-what-to-avoid">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Scheduling vs Triggering]]></title><description><![CDATA[How Workflows Actually Run in Production]]></description><link>https://zero2dataengineer.substack.com/p/scheduling-vs-triggering</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/scheduling-vs-triggering</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Thu, 22 May 2025 00:30:21 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/f377859c-0d1d-4a1a-bbaa-99fe54cafc4e_1024x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Most people set <strong>schedule_interval ='@daily'</strong> and move on.<br>But in production, nothing is that simple.</p><p>Data arrives late.<br>APIs fail.<br>Files drop into S3 at random.<br>And your pipeline has to <strong>wait</strong>, <strong>trigger</strong>, or <strong>backfill</strong> &#8212; not just run on a timer.</p><p>Today, we&#8217;re digging into how scheduling <em>really</em> works &#8212; and how you should answer when interviewers ask:</p><blockquote><p>&#8220;How do you schedule and trigger your Airflow DAGs?&#8221;</p></blockquote><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>First: Understand the Two Types of Runs</h3><ol><li><p><strong>Scheduled Runs</strong><br>You tell Airflow to run a DAG every X time:</p><ul><li><p>Every hour, day, week, etc.</p></li><li><p>Use cases: batch ETL, daily reporting, metrics updates</p></li></ul></li><li><p><strong>Triggered Runs</strong><br>Airflow runs the DAG <strong>when something happens</strong>:</p></li></ol><ul><li><p>A file lands in S3</p></li><li><p>An upstream DAG completes</p></li><li><p>An API returns a signal</p></li></ul><div><hr></div><p><strong>Elite Bonus Drop Coming:</strong><br>This Thursday, I&#8217;ll walk through <strong>Sensors, ExternalTask dependency patterns, and DAG chaining</strong> in a real multi-DAG system.</p><p>This is where your interview answers start sounding like a Staff Data Engineer.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe&quot;,&quot;text&quot;:&quot;UPGRADE TO ANNUAL MEMBERSHIP&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/subscribe"><span>UPGRADE TO ANNUAL MEMBERSHIP</span></a></p><div><hr></div>
      <p>
          <a href="https://zero2dataengineer.substack.com/p/scheduling-vs-triggering">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Build Your First DAG in Airflow]]></title><description><![CDATA[Hands-on Workflow You Can Actually Run]]></description><link>https://zero2dataengineer.substack.com/p/build-your-first-dag-in-airflow-f5c</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/build-your-first-dag-in-airflow-f5c</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Wed, 21 May 2025 00:30:30 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/3620148f-53a9-4728-83c1-5e88acd77497_1024x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>You can read 100 articles about DAGs.<br>You can quote what &#8220;Directed Acyclic Graph&#8221; means.<br>But none of it matters until you write one.</p><p>Today&#8217;s goal?<br>You&#8217;ll build your <strong>first working DAG</strong> &#8212; start to finish &#8212; that does something useful and gets you closer to job-ready.</p><p>If you can complete this, you&#8217;ll be able to walk into any interview and say:</p><blockquote><p>&#8220;Yes, I&#8217;ve built production-style pipelines in Airflow.&#8221;</p></blockquote><p>Let&#8217;s do this.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. <strong>Upgrade for Full Access &#8211; Learn in Detail, Never Forget!</strong></p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>Why This Matters</h3><p>Anyone can Google "What is Airflow?"<br>You&#8217;re here to <strong>build</strong> with it &#8212; and <strong>speak about it like an engineer who&#8217;s done it in prod</strong>.</p><p>Today&#8217;s drop isn&#8217;t theory.<br>You&#8217;ll create your <strong>first working DAG</strong> &#8212; a mini pipeline that does something useful, reliable, and interview-worthy.</p><p>You&#8217;ll leave this newsletter with:</p><ul><li><p>A runnable DAG</p></li><li><p>A deployable GitHub project</p></li><li><p>A STAR-based story you can use in interviews</p></li><li><p>And clarity about what makes DAGs production-grade</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe&quot;,&quot;text&quot;:&quot;Upgrade &amp; Learn in Detail!&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/subscribe"><span>Upgrade &amp; Learn in Detail!</span></a></p><div><hr></div><h3>What You&#8217;re Building</h3><p>The business use case is real:<br>A team receives a messy CSV of order data every few hours. Right now it&#8217;s being cleaned manually. You&#8217;ve been tasked with automating this.</p><p>Here&#8217;s what your DAG will do:</p><ul><li><p><strong>Extract:</strong> Read CSV</p></li><li><p><strong>Transform:</strong> Drop nulls, clean types</p></li><li><p><strong>Load:</strong> Write to Snowflake</p></li><li><p><strong>Retry:</strong> If anything breaks</p></li><li><p><strong>Alert:</strong> Slack notifications</p></li><li><p><strong>Schedule:</strong> Every 6 hours</p></li></ul><div><hr></div><h2>Your Airflow Starter Pack</h2><h3>Step 1: Install Airflow (If not done yet)</h3><pre><code>pip install apache-airflow
airflow db init
airflow webserver --port 8080
airflow scheduler</code></pre><p>Open Airflow UI at <code>localhost:8080</code> and create your user.</p>
      <p>
          <a href="https://zero2dataengineer.substack.com/p/build-your-first-dag-in-airflow-f5c">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Airflow Isn’t Scary — It’s a Life Saver]]></title><description><![CDATA[Why Every Data Engineer Needs to Master DAGs]]></description><link>https://zero2dataengineer.substack.com/p/airflow-isnt-scary-its-a-life-saver</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/airflow-isnt-scary-its-a-life-saver</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Tue, 20 May 2025 00:30:29 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!fDr8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F882e00e9-776a-4cc4-9e31-c5a4d979ec95_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3>Let&#8217;s Talk About Chaos.</h3><p>Not the kind you can meditate through &#8212;<br>The kind that breaks your data pipeline at 2AM while your dashboards go blank and your PM starts Slacking you with &#8220;???&#8221;</p><p>That&#8217;s why Airflow exists.</p><p>Airflow doesn&#8217;t just help you schedule things.<br>It&#8217;s the brain of your data pipelines &#8212; the system that says:<br>&#8220;Hey, this job failed, let&#8217;s retry.&#8221;<br>&#8220;Wait, don&#8217;t run until the upstream finished.&#8221;<br>&#8220;Log it, alert it, and move on.&#8221;</p><p>And yet, for most people, Airflow seems intimidating.</p><p>Today, we&#8217;ll break it down &#8212; no fluff, no buzzwords.<br>By the end of this, you&#8217;ll understand what Airflow really does, how DAGs work, and how to confidently talk about orchestration in any interview.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>What Airflow Actually Does</h3><p>Forget the textbook.</p><p>Here&#8217;s what Airflow <em>really</em> handles:</p><ul><li><p>Running your ETL jobs <strong>in the right order</strong></p></li><li><p>Retrying failed jobs automatically</p></li><li><p>Monitoring status of every task</p></li><li><p>Sending alerts if something breaks</p></li><li><p>Triggering tasks on a <strong>schedule</strong> or based on <strong>dependencies</strong></p></li></ul><p>That&#8217;s orchestration. It&#8217;s not the &#8220;doing&#8221; &#8212; it&#8217;s the <strong>directing</strong>.</p><p>Airflow isn&#8217;t processing your data. It&#8217;s making sure the <em>process</em> happens correctly.</p><div><hr></div><h3>How DAGs Work (Without the Jargon)</h3><p>A <strong>DAG</strong> = Directed Acyclic Graph = A set of tasks that:</p><ol><li><p>Have <strong>a clear start and end</strong></p></li><li><p>Never loop back on themselves</p></li><li><p>Run in a <strong>specific order</strong></p></li></ol><p>Think of it like a checklist for your pipeline:</p><ul><li><p>Step 1: Pull data from S3</p></li><li><p>Step 2: Transform using Spark</p></li><li><p>Step 3: Load to Snowflake</p></li><li><p>Step 4: Run quality checks</p></li><li><p>Step 5: Refresh dashboard</p></li></ul><p>Each of these is a <strong>task</strong>.<br>The entire thing is a <strong>DAG</strong>.</p><p>In Airflow, these steps are defined in Python. You set dependencies like this:</p><p><strong>extract_task &gt;&gt; transform_task &gt;&gt; load_task</strong></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe&quot;,&quot;text&quot;:&quot;UPGRADE TO ELITE/ANNUAL MEMBERSHIP&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/subscribe"><span>UPGRADE TO ELITE/ANNUAL MEMBERSHIP</span></a></p>
      <p>
          <a href="https://zero2dataengineer.substack.com/p/airflow-isnt-scary-its-a-life-saver">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[ETL Interview Breakdown: How Data Engineers Are Tested]]></title><description><![CDATA[Why "Build a pipeline" isn&#8217;t really what they&#8217;re asking]]></description><link>https://zero2dataengineer.substack.com/p/etl-interview-breakdown-how-data</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/etl-interview-breakdown-how-data</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Mon, 19 May 2025 17:30:38 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d913f3c-1b77-4736-bb68-3b16e27dd872_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3>Let&#8217;s Get Real About ETL Interviews</h3><p>You walk into a Data Engineering interview. They ask:</p><blockquote><p><em>&#8220;Can you walk me through an ETL pipeline you&#8217;ve built?&#8221;</em></p></blockquote><p>Seems basic, right?<br>But here&#8217;s the catch: <strong>they&#8217;re not looking for just a tool dump.</strong><br>They&#8217;re trying to reverse-engineer your thinking.</p><p>You&#8217;ve now built ETL pipelines, cleaned real data, and understood how batch vs streaming works.</p><p>Today&#8217;s goal is simple. In Zero2DataEngineer breakdown, we&#8217;ll <strong>decode how ETL interview questions are framed</strong>, what they&#8217;re secretly testing, and how to <strong>structure your answers like a pro</strong> &#8212; even if you&#8217;ve never worked at FAANG.</p><p>Whether you&#8217;re applying for a Data Engineer, Analytics Engineer, or even Backend-heavy role &#8212; <strong>ETL questions will show up</strong>.</p><p>Here&#8217;s how to answer them like someone who&#8217;s done it before.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>How ETL Interview Questions Are Really Framed</h3><p>They won&#8217;t ask:</p><blockquote><p>&#8220;What is ETL?&#8221;</p></blockquote><p>They&#8217;ll ask:</p><ul><li><p>How would you design a pipeline to load millions of rows from an external source?</p></li><li><p>What happens if your load step fails halfway through?</p></li><li><p>How do you make your pipeline idempotent?</p></li><li><p>When do you use batch vs real-time ingestion?</p></li></ul><p>The trick is to answer <strong>like a system thinker</strong>, not just a coder.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jC4y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1b9223-edd7-4991-a242-9f6ac8c33536_1672x800.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jC4y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1b9223-edd7-4991-a242-9f6ac8c33536_1672x800.png 424w, https://substackcdn.com/image/fetch/$s_!jC4y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1b9223-edd7-4991-a242-9f6ac8c33536_1672x800.png 848w, https://substackcdn.com/image/fetch/$s_!jC4y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1b9223-edd7-4991-a242-9f6ac8c33536_1672x800.png 1272w, https://substackcdn.com/image/fetch/$s_!jC4y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1b9223-edd7-4991-a242-9f6ac8c33536_1672x800.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jC4y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1b9223-edd7-4991-a242-9f6ac8c33536_1672x800.png" width="1456" height="697" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5f1b9223-edd7-4991-a242-9f6ac8c33536_1672x800.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:697,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:126320,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://zero2dataengineer.substack.com/i/163926328?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1b9223-edd7-4991-a242-9f6ac8c33536_1672x800.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jC4y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1b9223-edd7-4991-a242-9f6ac8c33536_1672x800.png 424w, https://substackcdn.com/image/fetch/$s_!jC4y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1b9223-edd7-4991-a242-9f6ac8c33536_1672x800.png 848w, https://substackcdn.com/image/fetch/$s_!jC4y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1b9223-edd7-4991-a242-9f6ac8c33536_1672x800.png 1272w, https://substackcdn.com/image/fetch/$s_!jC4y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1b9223-edd7-4991-a242-9f6ac8c33536_1672x800.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>Interview Answer Formula (Reframe Your Thinking)</h3><p>Use this <strong>ETL STAR + Stack Formula</strong> when answering:</p><p><strong>S</strong>ituation: What was the use case?<br><strong>T</strong>ool stack: Which tools did you choose and why?<br><strong>A</strong>rchitecture: Show the pipeline stages.<br><strong>R</strong>esilience: How did you handle failures, alerts, schema drift?<br><strong>+ Stack Justification</strong>: Why this combo (Airflow + S3 + Spark etc.)?</p><p>Pro tip: If you haven&#8217;t built one end-to-end yet, use this:</p><blockquote><p>&#8220;Here&#8217;s how I <em>would</em> design it for a [use case].&#8221;<br>Then walk them through your design &#8212; <strong>intelligently, not hypothetically.</strong></p></blockquote><div><hr></div><h3>Master These Real ETL Questions Before Your Next Interview</h3><div><hr></div><p><strong>Q1. What are some common challenges in ETL pipelines?</strong><br>1. Handling bad records<br>2. Schema changes<br>3. Late-arriving data<br>4. Dependencies &amp; retries</p><p><strong>Sample Answer:</strong></p><blockquote><p>&#8220;One of the biggest challenges I&#8217;ve faced is handling schema drift &#8212; especially when upstream sources silently change a column name or data type. I&#8217;ve built schema validation into the extraction step using Great Expectations and version control through Glue Catalog.</p><p>I&#8217;ve also handled bad records by logging and quarantining them into a separate S3 bucket with alerting. For late-arriving data, I design pipelines to be idempotent &#8212; using UPSERT logic or late data windows. And for dependencies, I always make sure DAGs have proper <code>depends_on_past</code>, retry logic, and failure alerts configured.&#8221;</p></blockquote><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe&quot;,&quot;text&quot;:&quot;UPGRADE TO ANNUAL&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/subscribe"><span>UPGRADE TO ANNUAL</span></a></p>
      <p>
          <a href="https://zero2dataengineer.substack.com/p/etl-interview-breakdown-how-data">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Batch vs Streaming Pipelines]]></title><description><![CDATA[Choosing the right flow for the right kind of data]]></description><link>https://zero2dataengineer.substack.com/p/batch-vs-streaming-pipelines</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/batch-vs-streaming-pipelines</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Fri, 16 May 2025 00:30:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d366aa5-7187-4a5a-809a-e975617b4568_832x832.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>You&#8217;ve built an ETL pipeline.<br>You&#8217;ve transformed and loaded data.</p><p>Now the question is:<br><strong>How often should your pipeline run?</strong><br>And more importantly&#8230; <strong>should it run in batches &#8212; or in real time?</strong></p><p>Let&#8217;s break it down.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>What Is a Batch Pipeline?</h2><p>A <strong>batch pipeline</strong> runs on a schedule.<br>It pulls a large chunk of data at once &#8212; typically daily, hourly, or weekly.</p><p>Think:</p><ul><li><p>Nightly revenue reports</p></li><li><p>Weekly customer churn rollups</p></li><li><p>Monthly sales summaries</p></li></ul><p>It&#8217;s like picking up laundry every Sunday.<br>No need to track every sock in real time &#8212; just do one large pickup.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe&quot;,&quot;text&quot;:&quot;Upgrade to Annual&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/subscribe"><span>Upgrade to Annual</span></a></p><div><hr></div><h3>When to Use Batch</h3><ul><li><p>Your data changes slowly (e.g., payments, orders)</p></li><li><p>You&#8217;re running reports, not alerts</p></li><li><p>You want to keep cloud costs low</p></li><li><p>You need high data completeness over speed</p></li></ul>
      <p>
          <a href="https://zero2dataengineer.substack.com/p/batch-vs-streaming-pipelines">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[What the Hell Is ETL?]]></title><description><![CDATA[Real-world breakdown of a misunderstood concept &#8212; and why every data engineer must master it]]></description><link>https://zero2dataengineer.substack.com/p/what-the-hell-is-etl</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/what-the-hell-is-etl</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Tue, 13 May 2025 00:30:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!7c37!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6211054c-ba33-429c-8284-85c474d859c9_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Welcome to Week 5 of Zero2DataEngineer</h2><p>This week, we&#8217;re shifting from data structure to <strong>data movement</strong>.<br>Because understanding SQL is table stakes &#8212; but knowing how raw data becomes trustworthy, usable insights?</p><p>That&#8217;s where <strong>ETL</strong> comes in.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>What the Hell Is ETL?</h2><p>ETL stands for:</p><ul><li><p><strong>Extract</strong> &#8211; pull raw data from source systems</p></li><li><p><strong>Transform</strong> &#8211; clean, reformat, validate, and reshape</p></li><li><p><strong>Load</strong> &#8211; push structured data into a final destination</p></li></ul><p>That&#8217;s it &#8212; in theory.</p><p>But in practice, ETL is <strong>messy, strategic, and essential.</strong></p><div><hr></div><h2>Let&#8217;s Make It Real: The Restaurant Analogy</h2><p>You're running a restaurant.</p><ul><li><p><strong>Extract</strong> = ordering raw ingredients from vendors</p></li><li><p><strong>Transform</strong> = chopping, cooking, seasoning, plating</p></li><li><p><strong>Load</strong> = serving the final dish to the customer</p></li></ul><p>You wouldn&#8217;t send a customer raw flour and onions.<br>That&#8217;s what skipping ETL looks like in a data system.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe&quot;,&quot;text&quot;:&quot;UPGRADE TO ANNUAL&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/subscribe"><span>UPGRADE TO ANNUAL</span></a></p>
      <p>
          <a href="https://zero2dataengineer.substack.com/p/what-the-hell-is-etl">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[SQL + RDBMS = Love]]></title><description><![CDATA[Why Structured Query Language Still Rules the World of Data]]></description><link>https://zero2dataengineer.substack.com/p/sql-rdbms-love</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/sql-rdbms-love</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Sat, 10 May 2025 00:30:49 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!zQxA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb79a41ed-d251-4719-b3a0-c334329ee992_1024x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This week, we&#8217;ve explored the foundation of structured data systems: relational databases, indexing, ACID transactions, and schema design.<br>Today, we bring it all together &#8212; and show how SQL is more than a language.<br>It&#8217;s a mindset. A framework. A career unlock.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. Make sure you're subscribed.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>You&#8217;ve learned SQL.<br>You&#8217;ve learned schema design.<br>You now understand indexing, ACID, joins, and denormalization.</p><p>But here&#8217;s the part nobody tells you:</p><blockquote><p>SQL isn&#8217;t just a query language.<br>It&#8217;s a way of <strong>thinking</strong> about data.</p></blockquote><p>The best data engineers? They don&#8217;t &#8220;run SQL.&#8221;<br>They <strong>design systems</strong> that make SQL sing.</p><div><hr></div><h2>The Layer Cake of Modern Data</h2><p>Let&#8217;s pull back the curtain. When you're working in a company like Stripe, Airbnb, or Netflix, your data system isn&#8217;t just tables and dashboards.</p><p>It&#8217;s a <strong>layered architecture</strong>, and SQL flows through it all:</p><ol><li><p><strong>Raw Layer</strong> &#8211; messy, fast, unfiltered (S3, GCS)</p></li><li><p><strong>Staging Layer</strong> &#8211; cleaned, typed, deduped</p></li><li><p><strong>Warehouse Layer</strong> &#8211; modeled, relational, denormalized</p></li><li><p><strong>Semantic Layer</strong> &#8211; Looker, Tableau, PowerBI</p></li><li><p><strong>Delivery Layer</strong> &#8211; Dashboards, APIs, notebooks</p></li></ol><p>SQL is the thread that weaves it all together &#8212; whether you write it, generate it with dbt, or visualize it through tools.</p><div><hr></div><h2>So What&#8217;s the Real Job of SQL?</h2><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe&quot;,&quot;text&quot;:&quot;UPGRADE TO ANNUAL&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/subscribe"><span>UPGRADE TO ANNUAL</span></a></p>
      <p>
          <a href="https://zero2dataengineer.substack.com/p/sql-rdbms-love">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Transactions & ACID: The Rules That Keep Data Sane]]></title><description><![CDATA[How databases protect your pipelines when everything else fails]]></description><link>https://zero2dataengineer.substack.com/p/transactions-and-acid-the-rules-that</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/transactions-and-acid-the-rules-that</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Fri, 09 May 2025 00:30:35 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36f15e77-0b2a-45de-8f3a-aba575afca3e_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to Day 4 of Zero2DataEngineer &#8212; this week is all about how real databases behave in production, not just in notebooks.</p><p>In data engineering, pipelines break, APIs fail, jobs timeout.<br>But the <em>data itself</em>?</p><p>That still needs to be correct.</p><p>Today we talk about how databases keep your world from falling apart &#8212; even during chaos.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>What is a Transaction?</h2><p>A transaction is a single unit of work in the database.</p><ul><li><p>Transfer money? Transaction.</p></li><li><p>Insert a row? Transaction.</p></li><li><p>Update 3 tables in sequence? Still a transaction.</p></li></ul><p>A good system either completes the transaction <strong>entirely</strong> or <strong>rolls everything back</strong> &#8212; no half-baked updates.</p><div><hr></div><h2>The ACID Model (Explained Like You&#8217;re On-Call)</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UAkK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0b030b1-6bbd-4ae8-b7bf-34cec61a66c9_1618x424.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UAkK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0b030b1-6bbd-4ae8-b7bf-34cec61a66c9_1618x424.png 424w, https://substackcdn.com/image/fetch/$s_!UAkK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0b030b1-6bbd-4ae8-b7bf-34cec61a66c9_1618x424.png 848w, https://substackcdn.com/image/fetch/$s_!UAkK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0b030b1-6bbd-4ae8-b7bf-34cec61a66c9_1618x424.png 1272w, https://substackcdn.com/image/fetch/$s_!UAkK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0b030b1-6bbd-4ae8-b7bf-34cec61a66c9_1618x424.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UAkK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0b030b1-6bbd-4ae8-b7bf-34cec61a66c9_1618x424.png" width="1456" height="382" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f0b030b1-6bbd-4ae8-b7bf-34cec61a66c9_1618x424.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:382,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:72974,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://zero2dataengineer.substack.com/i/163148250?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0b030b1-6bbd-4ae8-b7bf-34cec61a66c9_1618x424.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UAkK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0b030b1-6bbd-4ae8-b7bf-34cec61a66c9_1618x424.png 424w, https://substackcdn.com/image/fetch/$s_!UAkK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0b030b1-6bbd-4ae8-b7bf-34cec61a66c9_1618x424.png 848w, https://substackcdn.com/image/fetch/$s_!UAkK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0b030b1-6bbd-4ae8-b7bf-34cec61a66c9_1618x424.png 1272w, https://substackcdn.com/image/fetch/$s_!UAkK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0b030b1-6bbd-4ae8-b7bf-34cec61a66c9_1618x424.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Why this matters:</strong><br>You don&#8217;t want to deduct money from a customer&#8230; but fail to generate the order.<br>You also don&#8217;t want two people editing the same record and overwriting each other&#8217;s data.</p><div><hr></div><h2>Real-Life Scenario: E-Commerce Checkout</h2><p>Imagine a customer places an order. Here&#8217;s what happens:</p><ol><li><p>Add a row in <code>orders</code></p></li><li><p>Deduct stock from <code>inventory</code></p></li><li><p>Charge the customer in <code>payments</code></p></li><li><p>Send confirmation email</p></li></ol><p>Without transactions, if step 3 fails after step 1 &amp; 2 succeed&#8230;</p><ul><li><p>Inventory is gone</p></li><li><p>No payment</p></li><li><p>No order shipped</p></li><li><p>No email sent</p></li></ul><p>Now your support team is flooded.</p><p>With transactions, all 4 steps are wrapped in one block. If payment fails, nothing is saved.</p><div><hr></div><h2>SQL in Action</h2><pre><code>BEGIN;

INSERT INTO orders (...) VALUES (...);
UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 123;
INSERT INTO payments (...) VALUES (...);

COMMIT;</code></pre><p>If anything fails before <code>COMMIT</code>, Postgres rolls it all back. No manual cleanup. No missing rows. No silent data corruption.</p><div><hr></div><h2>Interview Angle: How to Stand Out</h2><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe&quot;,&quot;text&quot;:&quot;UPGRADE TO ANNUAL MEMBERSHIP&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/subscribe"><span>UPGRADE TO ANNUAL MEMBERSHIP</span></a></p>
      <p>
          <a href="https://zero2dataengineer.substack.com/p/transactions-and-acid-the-rules-that">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Indexing Secrets They Don’t Teach You]]></title><description><![CDATA[How to actually use indexes like an engineer &#8212; not a tutorial bot]]></description><link>https://zero2dataengineer.substack.com/p/indexing-secrets-they-dont-teach</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/indexing-secrets-they-dont-teach</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Thu, 08 May 2025 00:30:34 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!SEss!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F618d8ead-c882-4ef1-be6a-640b71600335_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome back, data minds &#8212;<br>We&#8217;ve talked tables. We&#8217;ve shaped schemas.<br>But today, we talk about <strong>speed</strong>.</p><p>Because all the cleanest SQL in the world means nothing&#8230;<br>If your query takes 17 minutes and your dashboard cries blood.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>Why Indexing Matters</h3><p>Imagine searching for a contact in your phone:</p><ul><li><p><strong>No Index:</strong> Scroll, scroll, scroll&#8230; manually.</p></li><li><p><strong>With Index:</strong> Type &#8220;S&#8221; &#8594; boom, &#8220;Sara&#8221; shows up instantly.</p></li></ul><p>That&#8217;s what indexes do to your database.<br>They <strong>help the engine locate what you need &#8212; without scanning every row</strong>.</p><div><hr></div><h3>Why Indexing Exists</h3><p>Imagine walking into a bookstore with no signage.<br>No sections, no labels. Just 100,000 books in a pile.</p><p>That&#8217;s your database without an index.<br>Even the simplest query becomes a <strong>full table scan</strong> &#8212; the database flips through every row, like a librarian on Red Bull.</p><p><strong>With an index?</strong><br>You give the DB a map. It jumps to exactly where the answer lives.</p><div><hr></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SEss!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F618d8ead-c882-4ef1-be6a-640b71600335_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SEss!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F618d8ead-c882-4ef1-be6a-640b71600335_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!SEss!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F618d8ead-c882-4ef1-be6a-640b71600335_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!SEss!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F618d8ead-c882-4ef1-be6a-640b71600335_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!SEss!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F618d8ead-c882-4ef1-be6a-640b71600335_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SEss!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F618d8ead-c882-4ef1-be6a-640b71600335_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/618d8ead-c882-4ef1-be6a-640b71600335_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1767465,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://zero2dataengineer.substack.com/i/163065723?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F618d8ead-c882-4ef1-be6a-640b71600335_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SEss!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F618d8ead-c882-4ef1-be6a-640b71600335_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!SEss!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F618d8ead-c882-4ef1-be6a-640b71600335_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!SEss!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F618d8ead-c882-4ef1-be6a-640b71600335_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!SEss!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F618d8ead-c882-4ef1-be6a-640b71600335_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h3>The Two Most Common Index Types</h3><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VvAd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e011261-a796-4c9e-8bec-e7b5f808b47b_1680x314.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VvAd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e011261-a796-4c9e-8bec-e7b5f808b47b_1680x314.png 424w, https://substackcdn.com/image/fetch/$s_!VvAd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e011261-a796-4c9e-8bec-e7b5f808b47b_1680x314.png 848w, https://substackcdn.com/image/fetch/$s_!VvAd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e011261-a796-4c9e-8bec-e7b5f808b47b_1680x314.png 1272w, https://substackcdn.com/image/fetch/$s_!VvAd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e011261-a796-4c9e-8bec-e7b5f808b47b_1680x314.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VvAd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e011261-a796-4c9e-8bec-e7b5f808b47b_1680x314.png" width="1456" height="272" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0e011261-a796-4c9e-8bec-e7b5f808b47b_1680x314.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:272,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:55680,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://zero2dataengineer.substack.com/i/163065723?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e011261-a796-4c9e-8bec-e7b5f808b47b_1680x314.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VvAd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e011261-a796-4c9e-8bec-e7b5f808b47b_1680x314.png 424w, https://substackcdn.com/image/fetch/$s_!VvAd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e011261-a796-4c9e-8bec-e7b5f808b47b_1680x314.png 848w, https://substackcdn.com/image/fetch/$s_!VvAd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e011261-a796-4c9e-8bec-e7b5f808b47b_1680x314.png 1272w, https://substackcdn.com/image/fetch/$s_!VvAd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e011261-a796-4c9e-8bec-e7b5f808b47b_1680x314.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div><hr></div><h3>Deep Dive: B-Tree vs Hash Index</h3><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LZd0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b36fb5-84e5-4cdf-bdbb-058da78c19a7_1660x338.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LZd0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b36fb5-84e5-4cdf-bdbb-058da78c19a7_1660x338.png 424w, https://substackcdn.com/image/fetch/$s_!LZd0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b36fb5-84e5-4cdf-bdbb-058da78c19a7_1660x338.png 848w, https://substackcdn.com/image/fetch/$s_!LZd0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b36fb5-84e5-4cdf-bdbb-058da78c19a7_1660x338.png 1272w, https://substackcdn.com/image/fetch/$s_!LZd0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b36fb5-84e5-4cdf-bdbb-058da78c19a7_1660x338.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LZd0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b36fb5-84e5-4cdf-bdbb-058da78c19a7_1660x338.png" width="1456" height="296" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b6b36fb5-84e5-4cdf-bdbb-058da78c19a7_1660x338.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:296,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:64050,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://zero2dataengineer.substack.com/i/163065723?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b36fb5-84e5-4cdf-bdbb-058da78c19a7_1660x338.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LZd0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b36fb5-84e5-4cdf-bdbb-058da78c19a7_1660x338.png 424w, https://substackcdn.com/image/fetch/$s_!LZd0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b36fb5-84e5-4cdf-bdbb-058da78c19a7_1660x338.png 848w, https://substackcdn.com/image/fetch/$s_!LZd0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b36fb5-84e5-4cdf-bdbb-058da78c19a7_1660x338.png 1272w, https://substackcdn.com/image/fetch/$s_!LZd0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b36fb5-84e5-4cdf-bdbb-058da78c19a7_1660x338.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><blockquote><p><strong>Interview tip:</strong> Most RDBMS default to <strong>B-Tree</strong>, so unless otherwise needed, go with that.</p></blockquote><div><hr></div><h3>Real World Example: The Wrong Index Costs Real Money</h3><p>When I worked on a loyalty points table with <strong>30M+ rows</strong>,<br>a teammate created an index on user_id.</p><p>Problem?<br>All queries filtered on <strong>created_at</strong> That index was <strong>useless</strong>.</p><p>We swapped it to:</p><pre><code>CREATE INDEX idx_created_at ON loyalty_points (created_at);</code></pre><p>Result: Dashboard load time dropped from 38s &#8594; 2s.<br>That tiny change saved $600/month in Snowflake compute credits.</p><div><hr></div><h3>Real-World: How One Index Saved a Failing Job</h3><p>At a logistics startup, we had a <strong>shipment tracking job</strong> that queried a 60M row tracking_events table every 5 minutes.</p><p><code>We thought it was clean:</code></p><pre><code>SELECT * FROM tracking_events WHERE status = 'delivered';</code></pre><p>But it took 45 seconds.</p><p>The problem? No index on status.</p><p>Worse &#8212; status had only 3 values (in_transit, delivered, failed) &#8594; low cardinality.</p><p>So indexing that didn&#8217;t help. Instead, we looked at event_time, which had high uniqueness and was used in the WHERE clause:</p><pre><code>CREATE INDEX idx_event_time ON tracking_events (event_time);</code></pre><p>Query dropped to 2.1 seconds.</p><p>And that&#8217;s when we learned:</p><p>Index the column that filters the MOST rows &#8212; and does so selectively.</p><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe&quot;,&quot;text&quot;:&quot;UPGRADE TO ANNUAL&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/subscribe"><span>UPGRADE TO ANNUAL</span></a></p>
      <p>
          <a href="https://zero2dataengineer.substack.com/p/indexing-secrets-they-dont-teach">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Normalization vs Denormalization]]></title><description><![CDATA[Schema Design is a Superpower]]></description><link>https://zero2dataengineer.substack.com/p/normalization-vs-denormalization</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/normalization-vs-denormalization</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Wed, 07 May 2025 00:30:07 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!c0TM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F481784b5-c96a-4d88-9d87-a05d36902358_1600x554.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome back to Zero2DataEngineer!</p><p><br>Today, we go deeper into how schema design can either streamline your data pipeline&#8230; or turn it into a spaghetti monster. we unpack <strong>how your database schema shapes everything &#8212; speed, cost, logic, scalability.</strong></p><p>And no, it&#8217;s not just theory.</p><p>Good schema = fast queries, fewer bugs, and easier handovers.<br>Bad schema = JOIN hell, duplication drama, and angry dashboards.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>What&#8217;s the Difference?</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!c0TM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F481784b5-c96a-4d88-9d87-a05d36902358_1600x554.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!c0TM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F481784b5-c96a-4d88-9d87-a05d36902358_1600x554.png 424w, https://substackcdn.com/image/fetch/$s_!c0TM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F481784b5-c96a-4d88-9d87-a05d36902358_1600x554.png 848w, https://substackcdn.com/image/fetch/$s_!c0TM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F481784b5-c96a-4d88-9d87-a05d36902358_1600x554.png 1272w, https://substackcdn.com/image/fetch/$s_!c0TM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F481784b5-c96a-4d88-9d87-a05d36902358_1600x554.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!c0TM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F481784b5-c96a-4d88-9d87-a05d36902358_1600x554.png" width="1456" height="504" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/481784b5-c96a-4d88-9d87-a05d36902358_1600x554.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:504,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:102102,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://zero2dataengineer.substack.com/i/162975147?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F481784b5-c96a-4d88-9d87-a05d36902358_1600x554.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!c0TM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F481784b5-c96a-4d88-9d87-a05d36902358_1600x554.png 424w, https://substackcdn.com/image/fetch/$s_!c0TM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F481784b5-c96a-4d88-9d87-a05d36902358_1600x554.png 848w, https://substackcdn.com/image/fetch/$s_!c0TM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F481784b5-c96a-4d88-9d87-a05d36902358_1600x554.png 1272w, https://substackcdn.com/image/fetch/$s_!c0TM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F481784b5-c96a-4d88-9d87-a05d36902358_1600x554.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LpPh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F173ba26f-0f14-4faf-b17d-39ef6ba28398.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LpPh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F173ba26f-0f14-4faf-b17d-39ef6ba28398.png 424w, https://substackcdn.com/image/fetch/$s_!LpPh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F173ba26f-0f14-4faf-b17d-39ef6ba28398.png 848w, https://substackcdn.com/image/fetch/$s_!LpPh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F173ba26f-0f14-4faf-b17d-39ef6ba28398.png 1272w, https://substackcdn.com/image/fetch/$s_!LpPh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F173ba26f-0f14-4faf-b17d-39ef6ba28398.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LpPh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F173ba26f-0f14-4faf-b17d-39ef6ba28398.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/173ba26f-0f14-4faf-b17d-39ef6ba28398.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:489,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://zero2dataengineer.substack.com/i/162975147?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F173ba26f-0f14-4faf-b17d-39ef6ba28398.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LpPh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F173ba26f-0f14-4faf-b17d-39ef6ba28398.png 424w, https://substackcdn.com/image/fetch/$s_!LpPh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F173ba26f-0f14-4faf-b17d-39ef6ba28398.png 848w, https://substackcdn.com/image/fetch/$s_!LpPh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F173ba26f-0f14-4faf-b17d-39ef6ba28398.png 1272w, https://substackcdn.com/image/fetch/$s_!LpPh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F173ba26f-0f14-4faf-b17d-39ef6ba28398.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WSSO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3920ec96-7cf2-4632-8cba-06aaca54c059_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WSSO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3920ec96-7cf2-4632-8cba-06aaca54c059_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!WSSO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3920ec96-7cf2-4632-8cba-06aaca54c059_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!WSSO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3920ec96-7cf2-4632-8cba-06aaca54c059_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!WSSO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3920ec96-7cf2-4632-8cba-06aaca54c059_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WSSO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3920ec96-7cf2-4632-8cba-06aaca54c059_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3920ec96-7cf2-4632-8cba-06aaca54c059_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1824096,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://zero2dataengineer.substack.com/i/162975147?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3920ec96-7cf2-4632-8cba-06aaca54c059_1536x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WSSO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3920ec96-7cf2-4632-8cba-06aaca54c059_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!WSSO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3920ec96-7cf2-4632-8cba-06aaca54c059_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!WSSO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3920ec96-7cf2-4632-8cba-06aaca54c059_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!WSSO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3920ec96-7cf2-4632-8cba-06aaca54c059_1536x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>Normalization Forms (Real-World Style)</h2><h3><strong>1NF &#8211; First Normal Form</strong></h3><ul><li><p><strong>Rule:</strong> Atomic columns, no repeating groups</p></li><li><p><strong>Think of it like:</strong> Every drawer in your closet holds only one type of item</p></li><li><p><strong>Example:</strong><br>Bad: <code>Phone Numbers = 123, 456</code><br>Good: Separate rows per number</p></li></ul><h3><strong>2NF &#8211; Second Normal Form</strong></h3><ul><li><p><strong>Rule:</strong> 1NF + all columns depend on entire primary key</p></li><li><p><strong>Think of it like:</strong> No room for half-relevant details</p></li><li><p><strong>Example:</strong> Don't include <code>Student Name</code> in a table where the primary key is (Student_ID, Course_ID)</p></li></ul><h3><strong>3NF &#8211; Third Normal Form</strong></h3><ul><li><p><strong>Rule:</strong> 2NF + no transitive dependencies</p></li><li><p><strong>Think of it like:</strong> No gossip. Columns shouldn't depend on other non-key columns.</p></li><li><p><strong>Example:</strong> Move <code>Department_Location</code> to a separate <code>Departments</code> table instead of placing it in <code>Employees</code>.</p></li></ul><div><hr></div><h2>Real Life Scenarios</h2><h3>Fintech Use Case: Normalized Schema (OLTP)</h3><p>Let&#8217;s say you're working at a fintech startup on a banking system. You separate:</p><ul><li><p><strong>Customer</strong> Table &#8594; updated rarely</p></li><li><p><strong>Accounts</strong> Table &#8594; moderate frequency</p></li><li><p><strong>Transactions</strong> Table &#8594; updated constantly</p></li></ul><p><strong>Why this works:</strong> Different update rates + data integrity + modular ETL pipelinesYour transactional system handles:</p><ul><li><p><strong>Customers</strong></p></li><li><p><strong>Accounts</strong></p></li><li><p><strong>Transactions</strong></p></li></ul><p>Instead of storing everything in one big table, you <strong>normalize</strong>:</p><ul><li><p>One table for customer details</p></li><li><p>Another for account info</p></li><li><p>A third for transactions</p></li></ul><p>Why?<br>Because each table changes at different speeds and scales. Transactions update constantly. Customers, rarely. This improves <strong>storage</strong>, <strong>indexing</strong>, and <strong>update reliability</strong>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe&quot;,&quot;text&quot;:&quot;UPGRADE TO ANNUAL SUBSCRIPTION&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/subscribe"><span>UPGRADE TO ANNUAL SUBSCRIPTION</span></a></p>
      <p>
          <a href="https://zero2dataengineer.substack.com/p/normalization-vs-denormalization">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Why Relational Databases Still Run the World]]></title><description><![CDATA[And what every real data engineer should deeply understand before writing a single JOIN.]]></description><link>https://zero2dataengineer.substack.com/p/why-relational-databases-still-run</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/why-relational-databases-still-run</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Tue, 06 May 2025 00:30:29 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!_xKG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03187ac0-3873-43c0-b14a-53a3fd7def67_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>.</em>&#128075; Welcome to Week 4 of Zero2DataEngineer</p><p>This week we leave behind raw querying and step into <strong>schema thinking</strong>.</p><p>This is where you stop being a query-runner and start becoming a <strong>real engineer</strong>.</p><p>Here&#8217;s what we&#8217;ll cover:</p><ul><li><p> How relational databases actually work</p></li><li><p>The power of constraints and keys</p></li><li><p>When to choose OLTP vs OLAP</p></li><li><p>How broken schema design breaks pipelines</p></li><li><p>And how to crush RDBMS interview questions</p></li></ul><div><hr></div><h3>What Even Is a Relational Database?</h3><p>It&#8217;s a system that stores data in tables &#8212; with clearly defined <strong>relationships</strong> between them.</p><p>Think:</p><ul><li><p><code>users</code></p></li><li><p><code>orders</code></p></li><li><p><code>products</code></p></li><li><p><code>payments</code></p></li></ul><p>Each table is connected using <strong>primary and foreign keys</strong>, ensuring <strong>data consistency</strong>, <strong>referential integrity</strong>, and <strong>scalable querying</strong>.</p><div><hr></div><h3>Real Example: Food Delivery App</h3><p>You&#8217;re building for DoorDash, Swiggy, or Zomato.</p><p>Here&#8217;s your schema:</p><ul><li><p><code>users(user_id, name, location)</code></p></li><li><p><code>restaurants(restaurant_id, name, cuisine)</code></p></li><li><p><code>orders(order_id, user_id, restaurant_id, created_at)</code></p></li><li><p><code>order_items(order_item_id, order_id, item_name, price)</code></p></li></ul><p>Now someone deletes a restaurant record.</p><p>If you didn&#8217;t use <strong>foreign key constraints</strong>, your orders table is now pointing to... nothing.<br>Your BI dashboard breaks. Refunds misfire. Customers rage tweet.</p><p>This is why RDBMS still matters &#8212; it protects you from yourself.</p><div><hr></div><h3>Schema Mistakes I&#8217;ve Lived Through</h3><blockquote><p><strong>Mistake #1</strong>: I once stored prices in 3 different places &#8212; product table, invoice table, and analytics table.<br>Marketing ran a 10% off promo.<br>One table got updated. Two didn&#8217;t.<br>Revenue numbers were off. Refunds had to be manually processed. Chaos.</p><p><strong>Fix</strong>: I created a normalized schema where pricing logic sat in one place, and every other table referenced it via foreign key. Revenue dashboards started matching Stripe. Everyone slept better.</p></blockquote><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe&quot;,&quot;text&quot;:&quot;UPGRADE TO ANNUAL&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/subscribe"><span>UPGRADE TO ANNUAL</span></a></p>
      <p>
          <a href="https://zero2dataengineer.substack.com/p/why-relational-databases-still-run">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Python Wrap: Debugging + Practice Projects]]></title><description><![CDATA[How to turn everything you learned into real portfolio assets and bulletproof scripts.]]></description><link>https://zero2dataengineer.substack.com/p/python-wrap-debugging-practice-projects</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/python-wrap-debugging-practice-projects</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Sat, 03 May 2025 00:35:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d88f407-722d-4bfb-815b-d6e4423d0402_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3>Welcome to Zero2DataEngineer &#8212; Week 3, Day 5</h3><p>This isn&#8217;t a wrap-up.<br>This is a transformation checkpoint.</p><p>By now, you&#8217;ve learned:</p><ul><li><p>How to clean data with pandas</p></li><li><p>How to call APIs and normalize JSON</p></li><li><p>How to handle files like a production engineer</p></li></ul><p>Today is about how to:</p><ul><li><p><strong>Debug like a pro</strong></p></li><li><p><strong>Structure your folders and logs</strong></p></li><li><p><strong>Turn those learnings into 3 real portfolio projects</strong></p></li></ul><p>Because employers don&#8217;t hire people who <em>know</em> Python &#8212;<br>They hire people who <em>used it to build something real</em>.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>What DEs Actually Do with Python</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!USjR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e29a88-c4f3-454b-93fc-901907eea468_1620x582.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!USjR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e29a88-c4f3-454b-93fc-901907eea468_1620x582.png 424w, https://substackcdn.com/image/fetch/$s_!USjR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e29a88-c4f3-454b-93fc-901907eea468_1620x582.png 848w, https://substackcdn.com/image/fetch/$s_!USjR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e29a88-c4f3-454b-93fc-901907eea468_1620x582.png 1272w, https://substackcdn.com/image/fetch/$s_!USjR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e29a88-c4f3-454b-93fc-901907eea468_1620x582.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!USjR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e29a88-c4f3-454b-93fc-901907eea468_1620x582.png" width="1456" height="523" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83e29a88-c4f3-454b-93fc-901907eea468_1620x582.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:523,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:90063,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://zero2dataengineer.substack.com/i/162694630?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e29a88-c4f3-454b-93fc-901907eea468_1620x582.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!USjR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e29a88-c4f3-454b-93fc-901907eea468_1620x582.png 424w, https://substackcdn.com/image/fetch/$s_!USjR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e29a88-c4f3-454b-93fc-901907eea468_1620x582.png 848w, https://substackcdn.com/image/fetch/$s_!USjR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e29a88-c4f3-454b-93fc-901907eea468_1620x582.png 1272w, https://substackcdn.com/image/fetch/$s_!USjR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e29a88-c4f3-454b-93fc-901907eea468_1620x582.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h3>DE-Style Script Structure</h3><p>Here&#8217;s how I organize every serious Python script:</p><pre><code>project/
&#9500;&#9472;&#9472; data/
&#9474;   &#9500;&#9472;&#9472; raw/
&#9474;   &#9500;&#9472;&#9472; processed/
&#9474;   &#9492;&#9472;&#9472; archive/
&#9500;&#9472;&#9472; logs/
&#9474;   &#9492;&#9472;&#9472; run_2024-04-26.log
&#9500;&#9472;&#9472; scripts/
&#9474;   &#9492;&#9472;&#9472; clean_users.py
&#9492;&#9472;&#9472; config/
    &#9492;&#9472;&#9472; secrets.env</code></pre><p>This structure tells future-you:</p><ul><li><p>What was cleaned</p></li><li><p>When it ran</p></li><li><p>Where it went</p></li><li><p>What config powered it</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe&quot;,&quot;text&quot;:&quot;UPGRADE TO ANNUAL&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/subscribe"><span>UPGRADE TO ANNUAL</span></a></p></li></ul>
      <p>
          <a href="https://zero2dataengineer.substack.com/p/python-wrap-debugging-practice-projects">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[File Handling Like a Pro]]></title><description><![CDATA[How Data Engineers read, write, and organize files at scale.]]></description><link>https://zero2dataengineer.substack.com/p/file-handling-like-a-pro</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/file-handling-like-a-pro</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Fri, 02 May 2025 00:30:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef2996af-ebf5-4006-8ed4-de2d4dbf5386_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3>Welcome to Zero2DataEngineer &#8212; Week 3, Day 3</h3><p>A real data engineer doesn&#8217;t say:</p><blockquote><p>&#8220;Let me just load this CSV and print the head.&#8221;</p></blockquote><p>They say:</p><blockquote><p>&#8220;Let&#8217;s structure this folder, validate file size, and ingest in chunks with logging.&#8221;</p></blockquote><p>Today&#8217;s lesson:<br>Python for <strong>production-grade file handling</strong>.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>Files You&#8217;ll Actually Handle in the Wild</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!clL_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc056145f-6918-4e4e-86c9-4b4f07d22f4a_1606x496.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!clL_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc056145f-6918-4e4e-86c9-4b4f07d22f4a_1606x496.png 424w, https://substackcdn.com/image/fetch/$s_!clL_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc056145f-6918-4e4e-86c9-4b4f07d22f4a_1606x496.png 848w, https://substackcdn.com/image/fetch/$s_!clL_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc056145f-6918-4e4e-86c9-4b4f07d22f4a_1606x496.png 1272w, https://substackcdn.com/image/fetch/$s_!clL_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc056145f-6918-4e4e-86c9-4b4f07d22f4a_1606x496.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!clL_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc056145f-6918-4e4e-86c9-4b4f07d22f4a_1606x496.png" width="1456" height="450" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c056145f-6918-4e4e-86c9-4b4f07d22f4a_1606x496.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:450,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:80631,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://zero2dataengineer.substack.com/i/162543353?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc056145f-6918-4e4e-86c9-4b4f07d22f4a_1606x496.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!clL_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc056145f-6918-4e4e-86c9-4b4f07d22f4a_1606x496.png 424w, https://substackcdn.com/image/fetch/$s_!clL_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc056145f-6918-4e4e-86c9-4b4f07d22f4a_1606x496.png 848w, https://substackcdn.com/image/fetch/$s_!clL_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc056145f-6918-4e4e-86c9-4b4f07d22f4a_1606x496.png 1272w, https://substackcdn.com/image/fetch/$s_!clL_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc056145f-6918-4e4e-86c9-4b4f07d22f4a_1606x496.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You&#8217;re not reading files for fun.<br>You&#8217;re turning them into <strong>clean, usable, repeatable pipeline inputs.</strong></p><div><hr></div><h3>Real Example: Chunk Reading a Large CSV</h3><pre><code>import pandas as pd

chunks = pd.read_csv("big_sales.csv", chunksize=50000)

for i, chunk in enumerate(chunks):
    cleaned = chunk.dropna(subset=["customer_id"])
    cleaned.to_csv(f"cleaned_sales_part_{i}.csv", index=False)</code></pre><p>This is how DEs handle:</p><ul><li><p>Large files without crashing memory</p></li><li><p>Splitting clean output into batches</p></li><li><p>Ensuring pipeline resiliency</p><p></p></li></ul>
      <p>
          <a href="https://zero2dataengineer.substack.com/p/file-handling-like-a-pro">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Working with APIs & JSON]]></title><description><![CDATA[From GET requests to clean tables &#8212; a Data Engineer&#8217;s guide to API data ingestion.]]></description><link>https://zero2dataengineer.substack.com/p/working-with-apis-and-json</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/working-with-apis-and-json</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Thu, 01 May 2025 00:05:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed951837-8843-4a9f-be18-c9ac0dd2c003_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3>Welcome to Zero2DataEngineer &#8212; Week 3, Day 4</h3><p>Most companies don&#8217;t give you tidy CSVs.<br>They give you access to an <strong>API.</strong></p><p>Your job?</p><ul><li><p>Authenticate</p></li><li><p>Call it</p></li><li><p>Handle errors</p></li><li><p>Parse the JSON</p></li><li><p>Normalize it into something that fits your pipeline</p></li></ul><p>Today&#8217;s lesson will teach you how to go from <strong>&#8220;what&#8217;s an API?&#8221;</strong> to <strong>&#8220;here&#8217;s how I ingest Stripe, Notion, or any public data feed.&#8221;</strong></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>Why APIs Matter for Data Engineers</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ox9Z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82bd44c6-a599-45a3-90af-e9eb62d82087_1616x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ox9Z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82bd44c6-a599-45a3-90af-e9eb62d82087_1616x600.png 424w, https://substackcdn.com/image/fetch/$s_!Ox9Z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82bd44c6-a599-45a3-90af-e9eb62d82087_1616x600.png 848w, https://substackcdn.com/image/fetch/$s_!Ox9Z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82bd44c6-a599-45a3-90af-e9eb62d82087_1616x600.png 1272w, https://substackcdn.com/image/fetch/$s_!Ox9Z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82bd44c6-a599-45a3-90af-e9eb62d82087_1616x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ox9Z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82bd44c6-a599-45a3-90af-e9eb62d82087_1616x600.png" width="1456" height="541" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/82bd44c6-a599-45a3-90af-e9eb62d82087_1616x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:541,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:99356,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://zero2dataengineer.substack.com/i/162702288?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82bd44c6-a599-45a3-90af-e9eb62d82087_1616x600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ox9Z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82bd44c6-a599-45a3-90af-e9eb62d82087_1616x600.png 424w, https://substackcdn.com/image/fetch/$s_!Ox9Z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82bd44c6-a599-45a3-90af-e9eb62d82087_1616x600.png 848w, https://substackcdn.com/image/fetch/$s_!Ox9Z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82bd44c6-a599-45a3-90af-e9eb62d82087_1616x600.png 1272w, https://substackcdn.com/image/fetch/$s_!Ox9Z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82bd44c6-a599-45a3-90af-e9eb62d82087_1616x600.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe&quot;,&quot;text&quot;:&quot;UPGRADE TO ANNUAL&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/subscribe"><span>UPGRADE TO ANNUAL</span></a></p>
      <p>
          <a href="https://zero2dataengineer.substack.com/p/working-with-apis-and-json">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Data Cleaning with Python]]></title><description><![CDATA[How to clean, standardize, and prep raw data like a real engineer.]]></description><link>https://zero2dataengineer.substack.com/p/data-cleaning-with-python</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/data-cleaning-with-python</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Wed, 30 Apr 2025 00:30:36 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!EcYZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F183e587b-0162-4763-99ce-02276f32f28c_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3>Welcome to Zero2DataEngineer &#8212; Week 3, Day 2</h3><p>Data cleaning isn&#8217;t sexy.<br>But neither is debugging a broken dashboard at 3AM.</p><p>This lesson is about:</p><ul><li><p>Detecting messes before they cause problems</p></li><li><p>Structuring your cleanup scripts for clarity</p></li><li><p>Building confidence in your data before it ever hits SQL</p></li></ul><p>Because you can&#8217;t query your way out of a dirty dataset.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>What Messy Data Actually Looks Like</h3><p>You won&#8217;t just see typos or NULLs.</p><p>You&#8217;ll face:</p><ul><li><p>Extra columns named <code>Unnamed: 12</code></p></li><li><p>Date formats like <code>"2023/01/07"</code> next to <code>"07-01-23"</code></p></li><li><p>Country codes like <code>IN</code>, <code>india</code>, <code>INDIA</code>, and <code>null</code></p></li><li><p>Duplicated rows from multi-system exports</p></li><li><p>Random empty string values (<code>""</code>) that silently break joins</p></li></ul><p>Your job isn&#8217;t just to &#8220;fix it.&#8221;</p><p>Your job is to make the next step downstream <em>bulletproof</em>.</p><div><hr></div><h3>Real Example: Cleaning Before Upload</h3><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe&quot;,&quot;text&quot;:&quot;UPGRADE TO ANNUAL&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/subscribe"><span>UPGRADE TO ANNUAL</span></a></p>
      <p>
          <a href="https://zero2dataengineer.substack.com/p/data-cleaning-with-python">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Python for Data Engineers — Not Developers]]></title><description><![CDATA[Why you don&#8217;t need to build apps &#8212; you need to build pipelines.]]></description><link>https://zero2dataengineer.substack.com/p/python-for-data-engineers-not-developers</link><guid isPermaLink="false">https://zero2dataengineer.substack.com/p/python-for-data-engineers-not-developers</guid><dc:creator><![CDATA[Avantika_Penumarty]]></dc:creator><pubDate>Tue, 29 Apr 2025 00:30:19 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7831cf41-c0a8-4f6b-9762-67d3160ca88b_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3>Welcome to Zero2DataEngineer &#8212; Week 3, Day 1</h3><p>Most Python tutorials teach you how to build apps, games, or complicated backend systems.</p><p>That&#8217;s great &#8212; if you want to be a software engineer.</p><p>But as a Data Engineer?</p><p>You need Python to:</p><ul><li><p>Clean messy data</p></li><li><p>Move data between systems</p></li><li><p>Automate boring tasks</p></li><li><p>Talk to APIs, files, and databases</p></li></ul><p>Today, we're flipping your mindset:<br><strong>Python isn't for coding. Python is for data movement.</strong></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Zero2Dataengineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>&#9989; How Data Engineers Actually Use Python</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PobA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f672b7-e0fd-420f-ac17-a7a70741f98c_1674x532.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PobA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f672b7-e0fd-420f-ac17-a7a70741f98c_1674x532.png 424w, https://substackcdn.com/image/fetch/$s_!PobA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f672b7-e0fd-420f-ac17-a7a70741f98c_1674x532.png 848w, https://substackcdn.com/image/fetch/$s_!PobA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f672b7-e0fd-420f-ac17-a7a70741f98c_1674x532.png 1272w, https://substackcdn.com/image/fetch/$s_!PobA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f672b7-e0fd-420f-ac17-a7a70741f98c_1674x532.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PobA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f672b7-e0fd-420f-ac17-a7a70741f98c_1674x532.png" width="1456" height="463" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/25f672b7-e0fd-420f-ac17-a7a70741f98c_1674x532.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:463,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:88513,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://zero2dataengineer.substack.com/i/162350860?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f672b7-e0fd-420f-ac17-a7a70741f98c_1674x532.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PobA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f672b7-e0fd-420f-ac17-a7a70741f98c_1674x532.png 424w, https://substackcdn.com/image/fetch/$s_!PobA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f672b7-e0fd-420f-ac17-a7a70741f98c_1674x532.png 848w, https://substackcdn.com/image/fetch/$s_!PobA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f672b7-e0fd-420f-ac17-a7a70741f98c_1674x532.png 1272w, https://substackcdn.com/image/fetch/$s_!PobA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f672b7-e0fd-420f-ac17-a7a70741f98c_1674x532.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You&#8217;re not building websites.<br>You&#8217;re building bridges between messy raw data and clean, usable data.</p><div><hr></div><h3>Real Example: Simple Python Data Cleaning</h3><p>Imagine you have a messy CSV of customers:</p><pre><code>customer_id, signup_date, country
123, 2024-01-10, us
124, NULL, uk
125, 2025-06-01, ca
126, 2024-11-05, null</code></pre><p>Your job?</p><ul><li><p>Remove NULL signup dates</p></li><li><p>Standardize country codes to uppercase</p></li><li><p>Save the clean output</p></li></ul><pre><code>import pandas as pd

df = pd.read_csv('customers.csv')

# Remove rows where signup_date is NULL
df = df.dropna(subset=['signup_date'])

# Standardize country codes
df['country'] = df['country'].str.upper()

# Save clean version
df.to_csv('customers_clean.csv', index=False)</code></pre><p>5 lines of Python &#8594; pipeline-ready data.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://zero2dataengineer.substack.com/subscribe&quot;,&quot;text&quot;:&quot;UPGRADE TO ANNUAL MEMBERSHIP&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://zero2dataengineer.substack.com/subscribe"><span>UPGRADE TO ANNUAL MEMBERSHIP</span></a></p>
      <p>
          <a href="https://zero2dataengineer.substack.com/p/python-for-data-engineers-not-developers">
              Read more
          </a>
      </p>
   ]]></content:encoded></item></channel></rss>