dbt (data build tool) is an open-source command-line tool that enables data analysts and engineers to transform data in their data warehouse more effectively. It brings software engineering best practices like version control, testing, and documentation to analytics workflows.
What Makes dbt Different
Traditional ETL tools require you to write transformations in proprietary languages or drag-and-drop interfaces. dbt takes a different approach:
- SQL-first: Write transformations in pure SQL (or Python in dbt Core 1.3+)
- ELT, not ETL: Assumes data is already loaded; focuses only on the T (transform)
- Modular: Build reusable models that reference each other
- Version Controlled: Store all code in Git for collaboration
- Tested: Define tests to validate data quality
- Documented: Auto-generate documentation from your code
Core Concepts
1. Models: SQL files that define transformations. Each model compiles to a SELECT statement and creates a table or view.
2. Sources: Declare raw tables loaded by your ingestion tools (Fivetran, Airbyte, etc.)
3. Tests: Assertions about your data (unique, not null, relationships)
4. Documentation: Describe models and columns; dbt generates a searchable doc site
5. Macros: Reusable SQL snippets (like functions) using Jinja templating
dbt Core vs dbt Cloud
- dbt Core: Free, open-source CLI tool you run locally or in CI/CD
- dbt Cloud: Managed platform with IDE, scheduling, and collaboration features
Common dbt Commands
``bash``
dbt run # Execute all models
dbt test # Run all tests
dbt docs generate # Generate documentation
dbt build # Run + test in dependency order
Why Teams Adopt dbt
- Speed: Analysts can own transformations without waiting for engineers
- Quality: Tests catch data issues before they reach dashboards
- Collaboration: Git-based workflow enables code review and teamwork
- Maintainability: Modular, documented code is easier to understand and update