Airflow Best Practices — What to Avoid in Projects & Interviews
From ‘It Worked on My Machine’ to ‘It Survives in Prod’
Most people say,
“Yeah, I’ve used Airflow.”
But interviewers can tell within 60 seconds whether you actually understand it — or just ran someone else’s DAG.
This post is not a checklist of features.
It’s a breakdown of the top Airflow mistakes that break pipelines, burn teams, and ruin interviews — and how to avoid them.
Let’s make you sound like someone who’s deployed DAGs in production, not just built toy examples.
Mistake #1: Treating Airflow Like a Script Runner
If your tasks are running huge pandas transformations, calling 15 APIs, and returning massive objects between tasks, you're doing it wrong.
Airflow is not Spark. Not dbt. Not a transformation engine. It’s the orchestrator — the conductor of the data workflow.
Do this instead:
Keep tasks modular
Push heavy lifting into external jobs (Spark, SQL, cloud ETL)
Use Airflow to manage dependencies, not business logic
How to say this in interviews:
“I used Airflow to orchestrate pipeline steps, but offloaded heavy data processing to Snowflake and Spark jobs to keep the DAGs lean and observable.”
Mistake #2: Ignoring Retry Logic and Failure Handling
Most junior engineers write DAGs that work… when nothing goes wrong.
But in production:
APIs time out
S3 files get delayed
Database connections drop
Best practices:
Always configure
retries,retry_delay, andon_failure_callbackLog why each task failed (don’t just rely on the default log dump)
Use idempotent task design — so retries don’t break downstream logic
Pro-level tip: Add exponential backoff and SLA alerts.


