Programming Cheat Sheets for Data Engineers

Python and PySpark references built for data work - not generic Python. Focus on data manipulation, distributed processing, and performance patterns that actually matter in data pipelines.

Python

Python for Data Engineers - pandas, list comprehensions, generators, context managers, typing

Distributed processing

PySpark & Spark SQL Cheat Sheet - DataFrame API, lazy eval, partitioning, shuffles, broadcast joins

Streaming and ingestion

Snowpipe Streaming & Kafka (Interview) - producers, consumers, partitions, exactly-once

Related tools

Use the JSON to SQL converter when going from semi-structured data to warehouse tables, and the SQL formatter to standardize output queries.

Related categories

Explore orchestration cheat sheets, SQL cheat sheets, and the full cheat sheet library.

← Back to Home

This page is fully accessible without JavaScript.

Privacy Policy
Terms
Disclaimer
About
Contact
RSS