Cortex Code for dbt: Cut Build Time 48%

Optimize your dbt project with Snowflake Cortex Code. Real results: 147 models, 48% faster builds. Step-by-step guide with SQL examples.

The Moment Everything Changed

It was a Tuesday morning when I finally snapped. My dbt project had grown to 147 models, and the daily run was taking 2 hours and 47 minutes. Our Airflow DAG was timing out. The business team was complaining about stale dashboards. And I was spending my entire morning investigating why dim_customer alone was taking 45 minutes to build.

I had tried everything: manual query optimization, clustering keys, switching materializations. Each fix helped a little, but I was basically guessing. Then someone on the data engineering Slack mentioned using Snowflake Cortex Code to analyze their dbt manifest file.

“Wait, it can do WHAT?” I asked.

That question changed my entire workflow. Three months later, my dbt runs average 1 hour 23 minutes—a 48% improvement. I spend 90% less time debugging performance. And I actually have time to build new features instead of firefighting slow models.

This isn’t a tutorial about how Cortex Code might help you. This is the real story of how it actually transformed my day-to-day work as a data engineer, with specific examples, exact prompts I use, and honest numbers about what works and what doesn’t.

Part 1: What Is Snowflake Cortex Code? (The Simple Truth)

Before I get into the dbt deep dive, let me explain what Cortex Code actually is—because the marketing doesn’t do it justice.

Cortex Code is code generation AI built directly into Snowflake. Think ChatGPT, but it:

Understands your Snowflake schema automatically
Knows dbt best practices
Can analyze JSON files (like manifest.json)
Generates production-ready SQL, Python, and more
Lives where you already work (Snowflake UI, or via API)

How it’s different from GitHub Copilot or ChatGPT:

Feature	Cortex Code	GitHub Copilot	ChatGPT
Knows your Snowflake schema	✅ Yes	❌ No	❌ No
Can read manifest.json	✅ Yes	❌ No	⚠️ Manual paste
Snowflake-specific SQL	✅ Optimized	⚠️ Generic	⚠️ Generic
dbt best practices	✅ Built-in	⚠️ Learns from code	⚠️ General knowledge
Privacy/Security	✅ Snowflake environment	⚠️ Code leaves editor	❌ Data uploaded

The key difference for data engineers: Cortex Code actually understands your data warehouse context.

Part 2: Getting Started (5-Minute Setup)

Step 1: Enable Cortex Code

Cortex Code is available in Snowflake (check your edition—Enterprise or higher typically has it).

Simple interface showing Snowflake Cortex Code prompt for generating dbt models

Step 1: Enable Cortex Code

Cortex Code is available in Snowflake (check your edition—Enterprise or higher typically has it).

-- Check if you have access
SELECT SYSTEM$GET_CORTEX_FEATURES();
-- If available, you're good to go
-- No additional setup needed

Step 2: First Test

How to Access Cortex Code:

Open Snowsight (Snowflake UI)
Look for the “AI Assistant” or “Cortex Code” button (usually in the sidebar or bottom-right)
Type your prompt in natural language
Get generated code instantly

Example first prompt:

Generate SQL to find top 10 customers by revenue from my customers and orders tables

Cortex Code responds with:

SELECT 
    c.customer_id,
    c.customer_name,
    SUM(o.order_amount) as total_revenue
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name
ORDER BY total_revenue DESC
LIMIT 10;

That’s it. No installation, no API keys, no configuration. Just natural language prompts.

My first “wow” moment: I typed “generate a complete dbt model for customer lifetime value with staging, intermediate, and mart layers” and it produced three properly structured models with naming conventions, tests, and documentation. Took 30 seconds.

Part 3: Core Capabilities (The Quick Tour)

Before we dive deep into dbt, here’s what Cortex Code can do:

Before SQL Generation:

Complex queries with CTEs, window functions, aggregations
Query optimization suggestions
Data quality checks

dbt Development:

Model generation (staging, marts, facts, dimensions)
Test creation (schema tests, custom tests)
Macro writing
Performance analysis from manifest.json ⭐

Airflow DAGs:

Complete DAG structures
Task dependencies and retry logic
Custom operators

Streamlit Dashboards:

Layout scaffolding
Chart configurations
Filter and interactivity code

Python UDFs:

Custom function generation
Pandas operations
Complex transformations

Debugging:

Code explanation
Error analysis
Optimization suggestions

How to Use: Simply open the Cortex Code interface in Snowsight and type what you need in plain English. Examples:

“Generate a dbt staging model for my customers table”
“Create an Airflow DAG for daily ETL”
“Build a Streamlit dashboard with revenue KPIs”

Now let’s talk about where it really shines: dbt optimization.

Part 4: dbt + Cortex Code – The Real Game Changer

4.1: Quick Overview – Beyond Basic Generation

Yes, Cortex Code can generate dbt models. Ask it for a staging model, it’ll give you:

-- Example prompt: "Generate dbt staging model for raw_customers"

-- models/staging/stg_customers.sql
with source as (
    select * from {{ source('raw', 'customers') }}
),

renamed as (
    select
        customer_id,
        customer_name,
        email,
        created_at,
        updated_at
    from source
)

select * from renamed

And yes, it can write tests:

# Prompt: "Create dbt tests for stg_customers"
version: 2

models:
  - name: stg_customers
    columns:
      - name: customer_id
        tests:
          - unique
          - not_null
      - name: email
        tests:
          - unique
          - not_null

But honestly? That’s the boring stuff. Any code generation tool can do this. Where Cortex Code becomes indispensable is performance optimization using your actual dbt metadata.

4.2: Performance Optimization – The Killer Feature

This is where I went from “this is neat” to “I can’t work without this anymore.”

The Problem I Had

My dbt project metrics (before Cortex Code):

147 models total
Full refresh: 2h 47min
Incremental run: 1h 15min
Daily Airflow timeout failures: 2-3 times per week
Time spent debugging performance: 6-8 hours per week

I had no systematic way to know:

Which models were actually slow?
Why were they slow?
What should I optimize first?
Were my optimizations working?

I was flying blind, making educated guesses based on gut feeling and manual timing of individual models.ed on gut feeling and manual timing of individual models.

A) Manifest.json Analysis – The Secret Weapon

Diagram showing how Cortex Code analyzes dbt manifest.json file to identify performance bottlenecks and optimization opportunities

Your dbt project generates a manifest.json file in the target/ folder after every run. It contains:

Every model’s metadata
Dependencies between models
Column information
Schema details

I never really looked at it. It’s thousands of lines of JSON. Until Cortex Code.

How to use it:

Step 1: Upload manifest.json to Snowflake

-- Create a stage for your dbt metadata
CREATE STAGE IF NOT EXISTS dbt_metadata;

-- Upload the file (via SnowSQL or Snowsight UI)
PUT file://~/dbt_project/target/manifest.json @dbt_metadata/;

Step 2: Open Cortex Code interface in Snowsight

Click on the “AI Assistant” or “Cortex Code” button in Snowsight
This opens the natural language interface

Step 3: Ask Cortex Code to analyze it

Type this prompt in the Cortex Code interface:

Analyze the manifest.json file in my dbt_metadata stage and identify the top 10 slowest models with specific optimization recommendations. 

Focus on:
- Materialization strategies (table vs incremental)
- Clustering opportunities  
- Complex CTEs that could be simplified
- Join patterns that could be optimized

Provide specific code changes and estimated time savings for each recommendation.

Step 4: Review the analysis

Cortex Code responds with detailed analysis (example of what I got):

Top 10 Slowest Models Analysis:

1. dim_customer (45.2 minutes)
   Issues identified:
   - Full table scan on 50M+ rows daily
   - No clustering key on frequently filtered columns
   - Using table materialization when incremental would work
   - 3 CTEs doing similar aggregations (could be deduplicated)
   
   Recommendations:
   - Add clustering key on (customer_id, updated_date)
   - Convert to incremental materialization
   - Use dbt_utils.deduplicate macro
   - Estimated improvement: 35-40 minutes
   
2. fact_orders_daily (23.7 minutes)
   Issues identified:
   - Window functions not optimized
   - Joining to dim_customer before aggregation
   - Missing partition by date
   
   Recommendations:
   - Aggregate first, then join
   - Add partition_by clause to window functions
   - Consider separate monthly partitions
   - Estimated improvement: 15-18 minutes
   
3. mart_customer_360 (18.4 minutes)
   ...

I was stunned. This analysis would have taken me days to do manually—if I even knew where to start. Cortex Code did it in 30 seconds.

B) Implementing the Recommendations

Let me show you exactly what I did for dim_customer:

Before (45 minutes):

-- models/marts/dim_customer.sql
{{
    config(
        materialized='table'
    )
}}
with customers as (
    select * from {{ ref('stg_customers') }}
),
orders as (
    select * from {{ ref('fct_orders') }}
),
aggregated as (
    select
        c.customer_id,
        c.customer_name,
        c.email,
        c.created_at,
        count(o.order_id) as total_orders,
        sum(o.order_amount) as lifetime_value,
        max(o.order_date) as last_order_date
    from customers c
    left join orders o on c.customer_id = o.customer_id
    group by 1,2,3,4
)
select * from aggregated

After (8 minutes) following Cortex Code suggestions:

Before and after comparison of dbt model performance: 45 minutes reduced to 8 minutes using Cortex Code optimization suggestions

-- models/marts/dim_customer.sql
{{
    config(
        materialized='incremental',
        unique_key='customer_id',
        cluster_by=['customer_id', 'updated_date'],
    )
}}
with customers as (
    select * from {{ ref('stg_customers') }}
    {% if is_incremental() %}
    where updated_date >= (select max(updated_date) from {{ this }})
    {% endif %}
),
orders_aggregated as (
    -- Aggregate BEFORE joining (Cortex suggestion!)
    select
        customer_id,
        count(order_id) as total_orders,
        sum(order_amount) as lifetime_value,
        max(order_date) as last_order_date
    from {{ ref('fct_orders') }}
    {% if is_incremental() %}
    where order_date >= (select max(last_order_date) from {{ this }})
    {% endif %}
    group by customer_id
),
final as (
    select
        c.customer_id,
        c.customer_name,
        c.email,
        c.created_at,
        c.updated_date,
        coalesce(o.total_orders, 0) as total_orders,
        coalesce(o.lifetime_value, 0) as lifetime_value,
        o.last_order_date
    from customers c
    left join orders_aggregated o on c.customer_id = o.customer_id
)
select * from final

Changes made:

✅ Switched to incremental materialization
✅ Added clustering keys on customer_id and updated_date
✅ Aggregated orders before joining (huge win!)
✅ Added incremental logic to only process new/changed data

Result: 45 minutes → 8 minutes (first run), 3 minutes (incremental runs)

C) run_results.json Deep Dive

The run_results.json file contains actual execution times and metadata from your last dbt run. Even more valuable than manifest for performance debugging.

My weekly performance review process:

-- Upload run_results from this week and last week
PUT file://~/dbt_project/target/run_results.json @my_stage/current/;
PUT file://~/dbt_project_backup/target/run_results.json @my_stage/previous/;

Example output:

Performance Regression Analysis:
CRITICAL REGRESSIONS (>50% slower):
1. mart_sales_summary
   - Previous: 4.2 min
   - Current: 9.8 min (+133%)
   - Root cause: Source table fct_sales grew from 10M to 25M rows
   - Recommendation: Add incremental logic with date partitioning
   
2. dim_product
   - Previous: 2.1 min
   - Current: 5.4 min (+157%)
   - Root cause: New join to external API table (no clustering)
   - Recommendation: Materialize API data first, add clustering key
MODERATE REGRESSIONS (20-50% slower):
3. stg_orders
   - Previous: 1.2 min
   - Current: 1.6 min (+33%)
   - Root cause: New data quality test added (full table scan)
   - Recommendation: Convert test to incremental or sampling
IMPROVEMENTS:
1. dim_customer: 45 min → 8 min (-82%) ✅ [Your optimization worked!]
2. fact_orders_daily: 23 min → 12 min (-48%) ✅
NEW BOTTLENECKS:
- mart_customer_cohort now takes 14 min (wasn't slow before)
- Likely due to dim_customer changes propagating downstream
- Recommendation: Review joins, consider pre-aggregation

This is gold. I immediately know what broke, why, and how to fix it.

D) Automated Performance Audits

I set up a weekly routine every Monday morning using Cortex Code:

My Monday Morning Workflow:

Run my standardized audit prompt

Upload latest manifest and run_results (automated via simple Python script)

Open Cortex Code interface

Perform a comprehensive dbt performance audit using the manifest.json and run_results.json in my dbt_metadata stage:

Analysis needed:
1. Identify slowest 15 models with root cause analysis
2. Detect performance anti-patterns:
   - Models using full refresh that should be incremental
   - Missing clustering keys on large tables  
   - Inefficient join patterns
   - Unnecessary full table scans
3. Find models that should be incremental but aren't
4. Suggest clustering keys based on filter/join patterns in SQL
5. Recommend materialization strategies (table vs view vs incremental)
6. Calculate estimated monthly compute time savings for each recommendation
7. Rank by effort/impact ratio (quick wins vs long-term projects)

Format as prioritized action plan with:
- Quick wins (high impact, <1 hour effort)
- Medium effort items (2-4 hours)  
- Strategic improvements (>4 hours)
- Estimated ROI for each

Sample output from last Monday:

dbt Performance Audit - 2026-01-20
QUICK WINS (High Impact, Low Effort):
1. Add clustering to dim_geography on (country_code, region_id)
   - Current: 6.2 min | Estimated after: 1.5 min | Effort: 5 min
   - Impact: Save 4.7 min per run = 33 hours/month
2. Convert fct_user_sessions to incremental
   - Current: 11.3 min | Estimated after: 2.1 min | Effort: 20 min
   - Impact: Save 9.2 min per run = 65 hours/month
MEDIUM EFFORT, HIGH IMPACT:
3. Refactor mart_product_performance (split into 2 models)
   - Current: 18.7 min | Estimated after: 7 min + 4 min | Effort: 2 hours
   - Impact: Save 7.7 min per run = 54 hours/month
   
ANTI-PATTERNS DETECTED:
⚠️ 8 models using table materialization with full refresh daily (should be incremental)
⚠️ 4 models missing clustering keys despite large table scans
⚠️ 2 models with circular dependency risk
TOTAL ESTIMATED MONTHLY SAVINGS: 247 hours of compute time
TOTAL IMPLEMENTATION EFFORT: ~8 hours
ROI: 30:1

I tackle 2-3 items from this list each week. In three months, I’ve saved 48% of total dbt runtime.

4.3: Real Production Examples

Let me share three actual scenarios where Cortex Code saved my ass:

Example 1: The Mysterious Slowdown

Problem: mart_revenue_daily suddenly went from 5 minutes to 32 minutes. No code changes, just suddenly slow.

What I did:

Analyze this slow dbt model. It suddenly went from 5 minutes to 32 minutes with no code changes.

Context:
- Model name: mart_revenue_daily
- Previous runtime: 5 min
- Current runtime: 32 min  
- No changes to model SQL in past month
- Fact table grew from 50M to 180M rows last week

Here's the compiled SQL from dbt:
[I pasted the full compiled SQL here]

Please:
1. Identify the root cause of slowdown
2. Point to specific problematic lines in the SQL
3. Provide concrete fix with rewritten SQL
4. Estimate performance improvement

Cortex Code’s Response:

Root Cause Identified:

The model joins fact_transactions (now 180M rows) to dim_date without filtering first.

Line 47-52 in your SQL:
```sql
left join {{ ref('dim_date') }} d
    on t.transaction_date = d.date_day
```

With 180M rows, this creates a massive intermediate result set before the WHERE clause on line 68.

Solution:
1. Move the date filter BEFORE the join (filter fact_transactions first)
2. Add clustering key on fact_transactions.transaction_date
3. Consider partitioning by month if queries typically filter by date range

Estimated improvement: 32 min → 6 min

Implementation:

-- Changed from this:
from {{ ref('fact_transactions') }} t
left join {{ ref('dim_date') }} d on t.transaction_date = d.date_day
where d.fiscal_year = 2024

-- To this:
from (
    select * from {{ ref('fact_transactions') }}
    where transaction_date >= '2024-01-01'
) t
left join {{ ref('dim_date') }} d on t.transaction_date = d.date_day

Plus added clustering:

{{ config(cluster_by=['transaction_date']) }}

Result: 32 minutes → 5.2 minutes. Problem solved in 20 minutes total.

Example 2: Pre-Launch Optimization

Scenario: Building a new mart_customer_churn_prediction model. Wanted to optimize it BEFORE the first production run.

I'm about to deploy this new dbt model to production. Please review and suggest optimizations BEFORE I run it for the first time.

Model: mart_customer_churn_prediction

Focus areas:
- Best materialization strategy (table, view, or incremental)
- Clustering key recommendations
- Potential bottlenecks in the SQL
- Incremental logic if applicable
- Query optimization opportunities

Here's the model SQL:
[pasted my model code]

Expected data volume: 50M+ customer records
Update frequency: Daily
Query pattern: Filtered by customer_segment and churn_date

Cortex Code’s Pre-Production Analysis:

Pre-Production Optimization Analysis:

CRITICAL ISSUES:
1. ❌ Using table materialization for 50M+ row result
   Recommendation: Use incremental with unique_key on customer_id
   
2. ❌ No clustering key despite filtering on customer_segment and churn_date
   Recommendation: cluster_by=['customer_segment', 'churn_date']
   
3. ❌ Complex CTE chain (7 CTEs) doing full table scans
   Recommendation: Break into separate intermediate models

PERFORMANCE PREDICTIONS:
- As written: Estimated 45-60 minutes first run
- With optimizations: Estimated 8-12 minutes first run, 2-3 min incremental

SUGGESTED REFACTOR:
Split into 3 models:
1. int_customer_metrics (intermediate, incremental)
2. int_churn_features (intermediate, incremental)  
3. mart_customer_churn_prediction (mart, incremental)

I implemented all suggestions. First production run: 9.3 minutes instead of the estimated 45-60.

Example 3: Monthly Performance Review

Every month, I do a comprehensive audit:

Step 1: Collect all metadata files

# My automation script copies these
cp ~/dbt_project/target/manifest.json ~/monthly_audits/2026-01/
cp ~/dbt_project/target/run_results.json ~/monthly_audits/2026-01/

Step 2: Upload to Snowflake

PUT file://~/monthly_audits/2026-01/* @dbt_metadata/monthly/2026-01/;

Step 3: Open Cortex Code and run monthly audit

Monthly dbt Performance Review - January 2026

Using files in dbt_metadata/monthly/2026-01/:
- manifest.json 
- run_results.json

Provide comprehensive analysis:

1. HEALTH METRICS
   - Overall project health score (0-100)
   - Total models and average runtime
   - Percentage using best practices (incremental, clustering)
   - Month-over-month performance trend

2. TOP ISSUES  
   - 10 slowest models with root cause
   - Performance anti-patterns detected
   - Models that grew disproportionately  
   - Technical debt items

3. CLEANUP OPPORTUNITIES
   - Unused or rarely-run models
   - Outdated materializations
   - Redundant transformations
   - Models that can be archived

4. OPTIMIZATION ROADMAP
   - Week-by-week action plan for next month
   - Quick wins vs strategic improvements
   - Estimated time savings and effort required
   - Projected end-of-month performance

5. ROI CALCULATIONS
   - Current monthly compute cost
   - Potential savings from recommendations
   - Effort/impact ratio for each item

January 2026 Audit Output:

dbt Project Health Score: 73/100 (Up from 61 last month)

PERFORMANCE SUMMARY:
- Total models: 147
- Average model runtime: 3.2 min (down from 5.1 min)
- Slowest model: dim_customer_360 (14.2 min)
- Models using incremental: 67% (target: 80%)
- Models with clustering: 45% (target: 70%)

TOP 10 ISSUES:
1. dim_customer_360 (14.2 min) - needs incremental + clustering
2. mart_sales_forecast (12.8 min) - complex window functions, consider simplification
3. fct_website_sessions (11.4 min) - full refresh daily, should be incremental
...

OPTIMIZATION ROADMAP - FEBRUARY 2026:
Week 1: Add clustering to 8 identified models (est. save 45 min/run)
Week 2: Convert 6 models to incremental (est. save 67 min/run)
Week 3: Refactor mart_sales_forecast (est. save 8 min/run)
Week 4: Remove 4 unused models identified

Projected end-of-month runtime: 58 minutes (current: 83 minutes)

Following this roadmap, I hit 61 minutes by month-end.

4.4: My Daily Workflow with Cortex Code

Here’s how Cortex Code fits into my actual workday:

Monday Morning (9:00 AM) – Weekly Review:

Upload latest manifest.json and run_results.json
Run performance audit
Create Jira tickets for top 3 optimization opportunities
Prioritize for the week

Tuesday-Thursday – Development:

Need a new model?
- Ask Cortex Code to generate boilerplate
- Review and customize for business logic
- Ask Cortex to optimize before first run
Model running slow?
- Share compiled SQL with Cortex
- Get optimization suggestions
- Implement and test

Weekly data engineering workflow integrating Snowflake Cortex Code for dbt optimization and development

Friday Afternoon – Cleanup:

Review week’s changes in dbt
Ask Cortex to review my new models for anti-patterns
Generate documentation with Cortex assistance
Prepare for Monday’s review

Time saved per week:

Before: 8-10 hours on performance debugging
After: 1-2 hours on Cortex-assisted optimization
Net savings: 6-8 hours weekly

4.5: Prompts That Actually Work

Here are my most-used prompts, copy-paste ready:

Performance Analysis:

"Analyze this manifest.json and identify the top 10 slowest models with specific, actionable optimization recommendations ranked by estimated time savings."

"Compare these two run_results.json files (last week vs this week) and identify performance regressions, improvements, and new bottlenecks. Prioritize by impact."

"This model runs in X minutes. Here's the compiled SQL: [paste]. Provide optimization suggestions with estimated impact for each."

Model Optimization:

"Review this dbt model and suggest: 1) Best materialization strategy, 2) Clustering keys, 3) Incremental logic if applicable, 4) Query optimizations. Model: [paste]"

"I'm building a new model for [business purpose]. Suggest optimal dbt structure including staging, intermediate, and mart layers with proper materializations."

Debugging:

"This dbt model suddenly got slow. Root cause analysis based on: Compiled SQL: [paste], Recent changes: [describe], Data volume changes: [numbers]"

"Why is this incremental model doing full refreshes? Model config: [paste], Logs: [paste]"

Ongoing Monitoring:

"Monthly dbt health audit. Analyze manifest + run_results. Provide: health score, top 10 issues, optimization roadmap. Files: [paste]"

"Identify unused or rarely-run models in this manifest that could be archived. Criteria: run less than once per week, not referenced by marts."

4.6: What Works vs. What Doesn’t

After 3 months of daily use, here’s my honest assessment:

What Works Exceptionally Well (9-10/10):

✅ Manifest.json analysis – Unbelievably accurate

Finds bottlenecks I’d never spot manually
Prioritizes by actual impact
Estimates are within 20% of reality

Visual comparison of Snowflake Cortex Code strengths and limitations for dbt optimization

✅ Performance regression detection – Catches issues immediately

Week-over-week comparisons are spot-on
Identifies root causes correctly 90% of the time

✅ Clustering key recommendations – Based on real query patterns

Suggestions almost always improve performance
Understands join patterns and filter predicates

✅ Materialization strategy advice – Knows when to use incremental vs table

Factors in data volume, update frequency, query patterns

✅ Boilerplate generation – Saves tons of typing

Staging models, tests, yml files
Follows dbt best practices

What’s Good But Needs Review (7-8/10):

⚠️ Macro generation – Often correct but review logic carefully

Sometimes over-complicates simple macros
Jinja syntax is usually right, logic sometimes questionable

⚠️ Incremental logic – Usually good starting point

Test thoroughly before production
Edge cases might not be covered
Deduplication logic needs validation

⚠️ Complex transformations – Can over-engineer

Tend to add unnecessary CTEs
Sometimes creates cleverness over clarity

What Doesn’t Work Well (4-6/10):

❌ Understanding specific business context – It’s AI, not a domain expert

Doesn’t know your business rules
Can’t infer data quality requirements
Might suggest technically sound but business-wrong logic

❌ Data distribution insights – Can’t see actual data

Clustering suggestions are pattern-based, not data-based
Doesn’t know your data skew or cardinality

❌ Cost optimization – Focuses on time, not cost

Doesn’t factor in warehouse sizing
Might suggest compute-expensive solutions

❌ Complex dependencies – Struggles with very large DAGs

Can get confused with 200+ model projects
Recommendations might create circular dependencies

Critical: What You Must Validate:

🔴 Always manually verify:

Incremental logic (especially deduplication)
Business logic in transformations
Data quality test logic
Macro behavior with edge cases
Performance impact in production (not just estimated)

4.7: Real Numbers from My Experience

Let me share the actual metrics that matter:

Before Cortex Code (December 2025):

dbt Performance:

Full refresh runtime: 2h 47min
Incremental runtime: 1h 15min
Models with clustering: 12/147 (8%)
Models using incremental: 42/147 (29%)
Airflow timeout failures: 2-3/week

My Time Spent:

Performance debugging: 8-10 hours/week
Manual manifest review: Never (too tedious)
Optimization work: Ad-hoc, reactive
New model development: 45-60 min per model

Costs:

Snowflake compute (dbt): ~$1,200/month
Airflow retries/failures: ~$180/month
My time opportunity cost: Unmeasured but significant

After 3 Months with Cortex Code (March 2026):

dbt Performance:

Full refresh runtime: 1h 23min (-50%)
Incremental runtime: 34min (-55%)
Models with clustering: 67/147 (46%)
Models using incremental: 99/147 (67%)
Airflow timeout failures: 1-2/month

My Time Spent:

Performance debugging: 1-2 hours/week (-85%)
Weekly manifest review: 15 min (automated with Cortex)
Optimization work: Systematic, proactive
New model development: 15-20 min per model (-67%)

Costs:

Snowflake compute (dbt): ~$680/month (-43%)
Airflow retries/failures: ~$35/month (-81%)
My time regained: 6-8 hours/week for feature development

ROI Calculation:

Cost of Cortex Code: Included in Snowflake Enterprise+ (no additional cost)

Monthly savings:

Snowflake compute: $520
Airflow compute: $145
My time (40 hours/month @ $75/hour): $3,000
Total: $3,665/month

Annual ROI: $43,980 in quantifiable savings

Plus unquantifiable benefits:

Faster feature delivery
Better data quality (more time for testing)
Reduced incidents and pages
Improved team morale (less firefighting)

Part 5: Quick Wins – Other Tools

While dbt is where I get the most value, Cortex Code is useful for other data engineering tools too:

5.1: Airflow DAGs

Quick generation using Cortex Code interface:

Generate a production-ready Airflow DAG for my daily dbt run:

Requirements:
- Schedule: Run daily at 2 AM UTC
- Retry logic: 3 attempts with 5-minute delay between retries
- Send Slack alert on failure (webhook already configured)
- Trigger downstream Tableau refresh on success
- Include SLA monitoring (alert if takes >90 minutes)
- Use dbt Cloud operator
- Handle task dependencies properly

Python version: 3.9
Airflow version: 2.7

Cortex Code generates:

from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.operators.python import PythonOperator
from airflow.providers.slack.notifications.slack import send_slack_notification
from datetime import datetime, timedelta

default_args = {
    'owner': 'data-engineering',
    'depends_on_past': False,
    'email_on_failure': False,
    'retries': 3,
    'retry_delay': timedelta(minutes=5),
    'sla': timedelta(minutes=90),
}

dag = DAG(
    'daily_dbt_run',
    default_args=default_args,
    description='Daily dbt production run',
    schedule='0 2 * * *',  # 2 AM UTC daily
    start_date=datetime(2026, 1, 1),
    catchup=False,
        text='❌ dbt DAG failed!',
        channel='#data-alerts'
    ),
)

dbt_run = BashOperator(
    task_id='dbt_run',
    bash_command='cd /opt/dbt && dbt run --profiles-dir .',
    dag=dag,
)

dbt_test = BashOperator(
    task_id='dbt_test',  
    bash_command='cd /opt/dbt && dbt test --profiles-dir .',
    dag=dag,
)

refresh_tableau = BashOperator(
    task_id='refresh_tableau',
    bash_command='curl -X POST https://tableau-server/api/refresh',
    dag=dag,
)

dbt_run >> dbt_test >> refresh_tableau

Gets you 80% of the way there in 30 seconds vs. 30 minutes of boilerplate typing.

5.2: Streamlit Dashboards

Dashboard scaffolding with Cortex Code:

Create a Streamlit dashboard for sales analytics:

Features needed:
- Date range filter (last 7, 30, 90 days or custom)
- KPI cards showing: Total Revenue, Total Orders, Average Order Value
- Line chart: Daily revenue trend over selected period
- Bar chart: Top 10 products by revenue
- Data table: Recent orders with filters
- Connect to Snowflake table: analytics.mart_daily_sales
- Use Streamlit's modern layout with columns
- Include data refresh timestamp
- Add export to CSV button

Make it production-ready with proper error handling.

Cortex Code generates:

import streamlit as st
import snowflake.connector
import pandas as pd
import plotly.express as px
from datetime import datetime, timedelta

st.set_page_config(page_title="Sales Analytics", layout="wide")

# Snowflake connection
@st.cache_resource
def get_connection():
    return snowflake.connector.connect(
        user=st.secrets["snowflake"]["user"],
        password=st.secrets["snowflake"]["password"],
        account=st.secrets["snowflake"]["account"],
        warehouse='ANALYTICS_WH',
        database='ANALYTICS',
        schema='MART'
    )

# Main dashboard
st.title("📊 Sales Analytics Dashboard")

# Date filter
col1, col2 = st.columns([3, 1])
with col1:
    date_range = st.selectbox(
        "Select Period",
        ["Last 7 Days", "Last 30 Days", "Last 90 Days", "Custom"]
    )

# ... [rest of dashboard code]

Generates working dashboard layout. Still need to refine styling and business logic, but saves 45 minutes of setup.

(Keeping these brief since the focus is dbt – but both are genuinely useful)

Part 6: Practical Tips for Data Engineers

The Learning Curve

Week 1: Feels magical but you don’t trust it

Generate code, read every line carefully
Validate everything in dev
Probably slower than writing manually

Week 2-4: Building confidence

Start recognizing patterns in good vs. questionable output
Develop your own prompt templates
20-30% faster than before

Month 2+: It’s part of your workflow

Know when to use it vs. when to write manually
Can spot hallucinations immediately
50-70% faster on routine tasks

My Validation Checklist

Before deploying Cortex-generated code:

✅ Logic review: Does this make business sense?
✅ Performance check: Run EXPLAIN on generated SQL
✅ Edge cases: Test with null values, duplicates, empty sets
✅ Incremental logic: Validate deduplication and update logic
✅ Dependencies: Check for circular references
✅ Tests: Generated code needs generated tests
✅ Peer review: Treat AI code like any other PR

When I Don’t Use Cortex Code

Never use for:

Financial calculations (too critical, audit requirements)
Security/access control logic (review manually)
One-off analyses (faster to write myself)
Learning new concepts (defeats the learning purpose)

Sometimes use for:

Debugging (helpful but verify root cause)
Refactoring (good starting point, heavy review)
Documentation (generates good drafts)

Always use for:

Boilerplate (staging models, tests, yml)
Performance analysis (manifest reviews)
Exploration (trying new patterns)

Part 7: The Honest Verdict

For dbt Specifically:

Model Generation: 8/10

Great for standard patterns
Saves typing, enforces conventions
Still need to add business logic

Test Creation: 9/10

Covers standard tests well
Good at identifying what to test
Custom tests need review

Manifest Analysis: 10/10 ⭐⭐⭐

This alone justifies using Cortex Code
Finds issues I’d never spot manually
Actionable, prioritized recommendations

Performance Optimization: 9/10

Suggestions are usually right
Massive time savings
Estimates are reasonably accurate

Macro Writing: 7/10

Good starting point
Logic sometimes over-complicated
Requires Jinja knowledge to review properly

Documentation: 8/10

Generates good yml drafts
Descriptions are generic but fixable
Saves tons of tedious typing

Overall Assessment:

Is Cortex Code worth it for data engineers?

Absolutely yes, with caveats:

✅ Use it if you:

Work with dbt daily
Have performance challenges
Want to spend less time on boilerplate
Value systematic optimization over guesswork
Are comfortable reviewing and validating AI output

⚠️ Be cautious if you:

Are still learning dbt (use it, but understand what it generates)
Have highly specialized/unusual patterns
Work in heavily regulated industry (extra validation needed)
Have very small dbt projects (<20 models – manual is fine)

❌ Skip it if you:

Don’t have Snowflake Enterprise+
Rarely write dbt code
Prefer full manual control (totally valid!)

The Real Value Proposition

It’s not about writing code faster (though that’s nice).

It’s about:

Systematic performance optimization instead of guesswork
Proactive monitoring instead of reactive firefighting
Data-driven decisions about what to optimize
Consistent code quality through enforced best practices
More time for high-value work instead of debugging

My Recommendation

Start small:

Week 1: Try manifest analysis only
Week 2: Generate a few staging models
Week 3: Use for performance debugging
Week 4: Incorporate into daily workflow

By month 2, you’ll wonder how you lived without it.

Conclusion: The Tool That Changed My Workflow

Three months ago, I was drowning in performance issues, spending my days debugging slow dbt models and my nights fixing Airflow timeouts.

Today, my dbt runs 48% faster, I spend 85% less time on performance debugging, and I actually have time to build new features instead of constantly firefighting.

Cortex Code didn’t just make me faster—it made me smarter about optimization. The manifest analysis taught me patterns I now recognize manually. The performance suggestions showed me best practices I’d never considered.

Is it perfect? No. Does it replace data engineering expertise? Definitely not. But used correctly, with proper validation and critical thinking, it’s become as essential to my workflow as dbt itself.

If you’re a data engineer using Snowflake and dbt, try the manifest analysis feature today. Upload your manifest.json, ask for performance recommendations, and see what it finds. I bet you’ll be shocked—I was.

And if you do try it, let me know what you discover. I’m always curious what performance wins other engineers are finding.

Now go optimize something. Your Airflow DAG will thank you.

Additional Resources

Snowflake Documentation:

FAQ

Q: Does Cortex Code work with dbt Cloud or just dbt Core? A: Works with both! It analyzes manifest.json regardless of how dbt runs.

Q: How much does Cortex Code cost? A: Included with Snowflake Enterprise Edition and higher. No additional charge.

Q: Can it analyze very large dbt projects (500+ models)? A: Yes, though response time increases. I’ve tested up to 300 models successfully.

Q: Does it send my code/data to external APIs? A: No. Cortex Code runs entirely within Snowflake’s environment.

Q: How often should I run performance audits? A: I do weekly quick checks, monthly comprehensive audits.

Cortex Code for dbt: Cut Build Time 48%

The Moment Everything Changed

Part 1: What Is Snowflake Cortex Code? (The Simple Truth)

Part 2: Getting Started (5-Minute Setup)

Cortex Code is available in Snowflake (check your edition—Enterprise or higher typically has it).

Part 3: Core Capabilities (The Quick Tour)

Part 4: dbt + Cortex Code – The Real Game Changer

4.1: Quick Overview – Beyond Basic Generation

4.2: Performance Optimization – The Killer Feature

The Problem I Had

A) Manifest.json Analysis – The Secret Weapon

B) Implementing the Recommendations

C) run_results.json Deep Dive

D) Automated Performance Audits

4.3: Real Production Examples

Example 1: The Mysterious Slowdown

Example 2: Pre-Launch Optimization

Example 3: Monthly Performance Review

4.4: My Daily Workflow with Cortex Code

4.5: Prompts That Actually Work

Performance Analysis:

Model Optimization:

Debugging:

Ongoing Monitoring:

4.6: What Works vs. What Doesn’t

What Works Exceptionally Well (9-10/10):

What’s Good But Needs Review (7-8/10):

What Doesn’t Work Well (4-6/10):

Critical: What You Must Validate:

4.7: Real Numbers from My Experience

Before Cortex Code (December 2025):

After 3 Months with Cortex Code (March 2026):

ROI Calculation:

Part 5: Quick Wins – Other Tools

5.1: Airflow DAGs

5.2: Streamlit Dashboards

Part 6: Practical Tips for Data Engineers

The Learning Curve

My Validation Checklist

When I Don’t Use Cortex Code

Part 7: The Honest Verdict

For dbt Specifically:

Overall Assessment:

The Real Value Proposition

My Recommendation

Conclusion: The Tool That Changed My Workflow

Additional Resources

FAQ

More Articles