✅ Data Quality

Great Expectations

An open-source Python framework for defining, documenting, and validating data quality expectations against datasets in data pipelines.

Great Expectations (GX) is an open-source Python library for data testing, documentation, and profiling. It helps data teams define "expectations" about their data and validate those expectations automatically in pipelines.

Core Concepts

1. Expectations: Assertions about data (e.g., "column A should not be null")
2. Expectation Suites: Collections of expectations for a dataset
3. Data Sources: Connections to your data (Pandas, Spark, SQL)
4. Checkpoints: Validation runbooks that execute expectations
5. Data Docs: Auto-generated documentation of expectations and results

Example Expectations

``python
import great_expectations as gx

# Create a Data Source
context = gx.get_context()
validator = context.sources.add_pandas("my_data").read_dataframe(df)

# Define Expectations
validator.expect_column_values_to_not_be_null("user_id")
validator.expect_column_values_to_be_in_set("status", ["active", "inactive"])
validator.expect_column_mean_to_be_between("order_total", 50, 200)
``

Why Teams Use Great Expectations

- Catch Issues Early: Validate data before it reaches downstream
- Documentation: Auto-generate data quality docs
- Collaboration: Share expectations across teams
- Integration: Works with Airflow, dbt, Spark, and more
- Open Source: Free to use with commercial support

Great Expectations GX Cloud

The SaaS version adds:
- Hosted expectation management
- Collaboration features
- Alerting and notifications
- Metrics and dashboards

Integration with Data Tools

- Airflow: GX operators for pipeline validation
- dbt: Run GX after dbt models
- Spark: Validate large-scale data
- Prefect/Dagster: Native integrations

Key Points

Frequently Asked Questions

What is Great Expectations used for?

Great Expectations is used for testing and validating data quality in pipelines. It lets you define expectations (assertions) about your data and automatically validate them, catching issues before they impact downstream systems.

Is Great Expectations free?

Yes, Great Expectations (GX Core) is open-source and free. GX Cloud is a commercial product that adds hosted management, collaboration, and alerting features.

How does Great Expectations compare to dbt tests?

dbt tests are simpler and SQL-based, great for basic checks. Great Expectations offers more advanced expectations, profiling, auto-documentation, and works with any data source beyond SQL warehouses.

What is an Expectation Suite?

An Expectation Suite is a collection of expectations (tests) for a specific dataset. For example, an "orders" suite might include expectations for non-null order_id, valid status values, and reasonable amounts.

← Back to Glossary

Last updated: 2026-01-21

SR

Published by

Sainath Reddy

Data Engineer at Anblicks
🎯 4+ years experience