Quick-reference guides for data engineers — 30 cheat sheets across 7 categories. From Snowflake SQL syntax to dbt commands, Airflow DAG patterns, window functions, interview questions, and production best practices — bookmark these for fast lookups during development and interview prep.
Production-proven best practices for Snowflake covering warehouse management, query optimization, cost control, security, and data modeling.
AdvancedBattle-tested dbt best practices for project structure, model design, testing, performance, and CI/CD. Based on real-world production deployments.
IntermediateProduction-hardened Airflow best practices for DAG design, task management, monitoring, and deployment. Avoid common pitfalls that cause pipeline failures.
IntermediateEssential AWS services for data engineering: S3, Glue, Lambda, Step Functions, Athena, IAM, and common architecture patterns with cost tips.
IntermediateKey Azure services for data engineering: ADLS Gen2, Data Factory, Synapse, Databricks, Functions, Purview, and common architecture patterns.
IntermediateDatabricks essentials: Delta Lake, Unity Catalog, SQL warehouses, workflows, Auto Loader, and performance tuning. Includes comparisons to Snowflake.
IntermediatePrepare for your Snowflake data engineer interview with these commonly asked questions covering architecture, performance tuning, security, and cost optimiza...
IntermediateMaster the most frequently asked SQL interview questions with detailed answers. Covers joins, window functions, CTEs, query optimization, and real-world data...
IntermediateComprehensive guide to data engineering interview questions covering ETL pipelines, data modeling, orchestration, cloud platforms, and system design.
AdvancedExpert-level interview questions on Snowflake Streams (CDC), Tasks, task graphs, and change data capture patterns. Covers offset tracking, CHANGES clause, an...
AdvancedExpert-level interview questions on Snowflake Dynamic Tables, declarative pipelines, target lag, refresh modes, and how they compare to streams/tasks and dbt.
AdvancedExpert interview questions on Snowpark DataFrame API, Python/Java/Scala UDFs, UDTFs, stored procedures, and when to use Snowpark vs pure SQL.
AdvancedExpert interview questions on Snowflake Secure Data Sharing, reader accounts, Marketplace listings, cross-cloud sharing, and data clean rooms.
AdvancedExpert interview questions on Apache Iceberg tables in Snowflake, external volumes, catalog integration, open table formats, and interoperability with Spark/...
AdvancedExpert questions on Cortex LLM functions, ML functions, embeddings, fine-tuning, and AI apps in Snowflake.
AdvancedExpert questions on Snowpipe Streaming, Kafka connector, classic Snowpipe, channels, and real-time ingestion patterns.
AdvancedExpert questions on dynamic data masking, row access policies, tag-based governance, ACCESS_HISTORY, and data protection at scale.
AdvancedExpert questions on credit consumption, warehouse sizing, auto-suspend, resource monitors, serverless costs, and strategies to reduce spend 30–60%.
AdvancedExpert questions on micro-partition pruning, clustering keys, search optimization, query profiling, spilling to disk, and the three caching layers.
AdvancedExpert questions on database replication, replication groups, failover, disaster recovery, and cross-region/cross-cloud patterns.
AdvancedExpert questions on VARIANT, OBJECT, ARRAY data types, FLATTEN, LATERAL, JSON/XML/Parquet handling, and schema-on-read patterns.
AdvancedExpert questions on stored procedures, JavaScript/Python/SQL UDFs, caller vs owner rights, transaction management, and security considerations.
AdvancedExpert questions on external functions, API integrations, external tables, external stages, and connecting Snowflake to external systems.
AdvancedComplete reference for dbt CLI commands, Jinja macros, model selection syntax, and project configuration. Covers dbt Core and dbt Cloud.
IntermediateEssential Airflow CLI commands, DAG patterns, operators, and best practices for orchestrating data pipelines.
IntermediateEssential Python patterns for data engineering: pandas, itertools, generators, connecting to Snowflake, writing production-safe ETL scripts, and testing data...
IntermediatePySpark DataFrame API, Spark SQL, window functions, partitioning, caching, and performance tuning for production data pipelines.
IntermediateEssential Snowflake SQL commands, functions, and syntax for data engineers. Covers DDL, DML, querying, and Snowflake-specific features.
IntermediateMaster ROW_NUMBER, RANK, LEAD, LAG, running totals, and moving averages. Works across Snowflake, BigQuery, Postgres, and Redshift.
IntermediateLooking for in-depth explanations? Check our Data Engineering Glossary for 30+ key terms.
For hands-on tutorials and guides, browse the full article library.