Apache Airflow Cheat Sheet

IntermediateLast updated: 2026-04-09 • 5 sections

Essential Airflow CLI commands, DAG patterns, operators, and best practices for orchestrating data pipelines.

CLI Commands

CommandDescriptionExample
airflow dags listList all DAGsairflow dags list -o table
airflow dags triggerManually trigger a DAGairflow dags trigger my_dag --conf '{"key":"val"}'
airflow dags pausePause a DAGairflow dags pause my_dag
airflow dags unpauseUnpause a DAGairflow dags unpause my_dag
airflow tasks testTest a single task (no state)airflow tasks test my_dag my_task 2026-01-01
airflow tasks runRun a task with state trackingairflow tasks run my_dag my_task 2026-01-01
airflow tasks listList tasks in a DAGairflow tasks list my_dag --tree
airflow db initInitialize Airflow metadata DBairflow db init
airflow db upgradeUpgrade metadata DB schemaairflow db upgrade
airflow connections listList all connectionsairflow connections list -o table
airflow variables getGet a variable valueairflow variables get my_variable
airflow infoShow Airflow config infoairflow info

DAG Definition Pattern

from airflow import DAG
from airflow.operators.python import PythonOperator
from airflow.providers.snowflake.operators.snowflake import SnowflakeOperator
from datetime import datetime, timedelta

default_args = {
    'owner': 'data-engineering',
    'depends_on_past': False,
    'email_on_failure': True,
    'email': ['[email protected]'],
    'retries': 2,
    'retry_delay': timedelta(minutes=5),
}

with DAG(
    dag_id='daily_etl_pipeline',
    default_args=default_args,
    description='Daily ETL from sources to warehouse',
    schedule_interval='0 6 * * *',       # 6 AM daily
    start_date=datetime(2026, 1, 1),
    catchup=False,
    tags=['etl', 'daily', 'production'],
    max_active_runs=1,
) as dag:

    extract = PythonOperator(
        task_id='extract_data',
        python_callable=extract_from_api,
    )

    transform = SnowflakeOperator(
        task_id='transform_data',
        snowflake_conn_id='snowflake_prod',
        sql='sql/transform.sql',
    )

    validate = PythonOperator(
        task_id='validate_output',
        python_callable=run_data_quality_checks,
    )

    extract >> transform >> validate

Common Operators

OperatorUse CaseKey Parameters
PythonOperatorRun Python functionspython_callable, op_args, op_kwargs
BashOperatorRun shell commandsbash_command, env
SnowflakeOperatorExecute Snowflake SQLsnowflake_conn_id, sql, warehouse
S3ToSnowflakeOperatorLoad S3 files into Snowflakes3_keys, table, schema, file_format
EmailOperatorSend email notificationsto, subject, html_content
BranchPythonOperatorConditional branchingpython_callable (must return task_id)
TriggerDagRunOperatorTrigger another DAGtrigger_dag_id, conf, wait_for_completion
ShortCircuitOperatorSkip downstream if Falsepython_callable (returns True/False)
DummyOperatorNo-op for DAG structuretask_id (used for join points)
TaskGroupVisual grouping of tasksgroup_id, tooltip

Schedule Interval Quick Reference

ScheduleCron ExpressionPreset String
Every minute* * * * *@once (single run)
Hourly0 * * * *@hourly
Daily at midnight0 0 * * *@daily
Weekly (Sunday)0 0 * * 0@weekly
Monthly (1st)0 0 1 * *@monthly
Yearly (Jan 1)0 0 1 1 *@yearly
Weekdays 6 AM0 6 * * 1-5(custom cron)
Every 15 min*/15 * * * *(custom cron)

Best Practices

  • Set catchup=False unless you specifically need historical backfills
  • Use max_active_runs=1 for ETL DAGs to prevent parallel run conflicts
  • Store SQL in separate files (sql/ directory) instead of inline strings
  • Use Airflow Variables and Connections — never hardcode credentials in DAGs
  • Use TaskGroups to visually organize complex DAGs instead of SubDAGs (deprecated)
  • Set meaningful retries and retry_delay — most transient failures resolve in 5 minutes
  • Use depends_on_past=False unless tasks truly depend on their own previous run
  • Tag DAGs (tags=['etl','daily']) for filtering in the Airflow UI

Frequently Asked Questions

What is the difference between schedule_interval and timetable in Airflow?

schedule_interval uses cron expressions or preset strings (@daily, @hourly) and runs at fixed intervals. Timetables (Airflow 2.2+) are Python classes that allow complex custom schedules like "business days only" or "last Friday of each month". Use timetables when cron expressions can't express your schedule.

Should I use catchup=True or catchup=False?

Use catchup=False (default recommended) for most DAGs — it only runs for the current interval. Use catchup=True when you need Airflow to backfill historical runs, such as when deploying a new DAG that needs to process past data. Be careful: a start_date months ago with catchup=True will trigger hundreds of runs.

How do I pass data between Airflow tasks?

Use XComs (cross-communication) to pass small data between tasks. Tasks can push values with xcom_push() or return values (auto-pushed). Downstream tasks pull with xcom_pull(task_ids='upstream_task'). For large data, write to a shared storage (S3, GCS) and pass the file path via XComs.

Related Cheat Sheets

dbt Commands Cheat SheetSnowflake SQL Cheat Sheet
← All Cheat Sheets