The Moment Everything Changed

It was a Tuesday morning when I finally snapped. My dbt project had grown to 147 models, and the daily run was taking 2 hours and 47 minutes. Our Airflow DAG was timing out. The business team was complaining about stale dashboards. And I was spending my entire morning investigating why dim_customer alone was taking 45 minutes to build.

I had tried everything: manual query optimization, clustering keys, switching materializations. Each fix helped a little, but I was basically guessing. Then someone on the data engineering Slack mentioned using Snowflake Cortex Code to analyze their dbt manifest file.

“Wait, it can do WHAT?” I asked.

That question changed my entire workflow. Three months later, my dbt runs average 1 hour 23 minutes—a 48% improvement. I spend 90% less time debugging performance. And I actually have time to build new features instead of firefighting slow models.

This isn’t a tutorial about how Cortex Code might help you. This is the real story of how it actually transformed my day-to-day work as a data engineer, with specific examples, exact prompts I use, and honest numbers about what works and what doesn’t.


Part 1: What Is Snowflake Cortex Code? (The Simple Truth)

Before I get into the dbt deep dive, let me explain what Cortex Code actually is—because the marketing doesn’t do it justice.

Cortex Code is code generation AI built directly into Snowflake. Think ChatGPT, but it:

  • Understands your Snowflake schema automatically
  • Knows dbt best practices
  • Can analyze JSON files (like manifest.json)
  • Generates production-ready SQL, Python, and more
  • Lives where you already work (Snowflake UI, or via API)

How it’s different from GitHub Copilot or ChatGPT:

FeatureCortex CodeGitHub CopilotChatGPT
Knows your Snowflake schema✅ Yes❌ No❌ No
Can read manifest.json✅ Yes❌ No⚠️ Manual paste
Snowflake-specific SQL✅ Optimized⚠️ Generic⚠️ Generic
dbt best practices✅ Built-in⚠️ Learns from code⚠️ General knowledge
Privacy/Security✅ Snowflake environment⚠️ Code leaves editor❌ Data uploaded

The key difference for data engineers: Cortex Code actually understands your data warehouse context.


Part 2: Getting Started (5-Minute Setup)

Step 1: Enable Cortex Code

Cortex Code is available in Snowflake (check your edition—Enterprise or higher typically has it).

Simple interface showing Snowflake Cortex Code prompt for generating dbt models

Step 1: Enable Cortex Code

Cortex Code is available in Snowflake (check your edition—Enterprise or higher typically has it).

-- Check if you have access
SELECT SYSTEM$GET_CORTEX_FEATURES();
-- If available, you're good to go
-- No additional setup needed

Step 2: First Test

How to Access Cortex Code:

  1. Open Snowsight (Snowflake UI)
  2. Look for the “AI Assistant” or “Cortex Code” button (usually in the sidebar or bottom-right)
  3. Type your prompt in natural language
  4. Get generated code instantly

Example first prompt:

Generate SQL to find top 10 customers by revenue from my customers and orders tables

Cortex Code responds with:

SELECT 
    c.customer_id,
    c.customer_name,
    SUM(o.order_amount) as total_revenue
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name
ORDER BY total_revenue DESC
LIMIT 10;

That’s it. No installation, no API keys, no configuration. Just natural language prompts.

My first “wow” moment: I typed “generate a complete dbt model for customer lifetime value with staging, intermediate, and mart layers” and it produced three properly structured models with naming conventions, tests, and documentation. Took 30 seconds.


Part 3: Core Capabilities (The Quick Tour)

Before we dive deep into dbt, here’s what Cortex Code can do:

Before SQL Generation:

  • Complex queries with CTEs, window functions, aggregations
  • Query optimization suggestions
  • Data quality checks

dbt Development:

  • Model generation (staging, marts, facts, dimensions)
  • Test creation (schema tests, custom tests)
  • Macro writing
  • Performance analysis from manifest.json

Airflow DAGs:

  • Complete DAG structures
  • Task dependencies and retry logic
  • Custom operators

Streamlit Dashboards:

  • Layout scaffolding
  • Chart configurations
  • Filter and interactivity code

Python UDFs:

  • Custom function generation
  • Pandas operations
  • Complex transformations

Debugging:

  • Code explanation
  • Error analysis
  • Optimization suggestions

How to Use: Simply open the Cortex Code interface in Snowsight and type what you need in plain English. Examples:

  • “Generate a dbt staging model for my customers table”
  • “Create an Airflow DAG for daily ETL
  • “Build a Streamlit dashboard with revenue KPIs”

Now let’s talk about where it really shines: dbt optimization.


Part 4: dbt + Cortex Code – The Real Game Changer

4.1: Quick Overview – Beyond Basic Generation

Yes, Cortex Code can generate dbt models. Ask it for a staging model, it’ll give you:

-- Example prompt: "Generate dbt staging model for raw_customers"

-- models/staging/stg_customers.sql
with source as (
    select * from {{ source('raw', 'customers') }}
),

renamed as (
    select
        customer_id,
        customer_name,
        email,
        created_at,
        updated_at
    from source
)

select * from renamed

And yes, it can write tests:

# Prompt: "Create dbt tests for stg_customers"
version: 2

models:
  - name: stg_customers
    columns:
      - name: customer_id
        tests:
          - unique
          - not_null
      - name: email
        tests:
          - unique
          - not_null

But honestly? That’s the boring stuff. Any code generation tool can do this. Where Cortex Code becomes indispensable is performance optimization using your actual dbt metadata.


4.2: Performance Optimization – The Killer Feature

This is where I went from “this is neat” to “I can’t work without this anymore.”

The Problem I Had

My dbt project metrics (before Cortex Code):

  • 147 models total
  • Full refresh: 2h 47min
  • Incremental run: 1h 15min
  • Daily Airflow timeout failures: 2-3 times per week
  • Time spent debugging performance: 6-8 hours per week

I had no systematic way to know:

  • Which models were actually slow?
  • Why were they slow?
  • What should I optimize first?
  • Were my optimizations working?

I was flying blind, making educated guesses based on gut feeling and manual timing of individual models.ed on gut feeling and manual timing of individual models.


A) Manifest.json Analysis – The Secret Weapon

Diagram showing how Cortex Code analyzes dbt manifest.json file to identify performance bottlenecks and optimization opportunities

Your dbt project generates a manifest.json file in the target/ folder after every run. It contains:

  • Every model’s metadata
  • Dependencies between models
  • Column information
  • Schema details

I never really looked at it. It’s thousands of lines of JSON. Until Cortex Code.

How to use it:

Step 1: Upload manifest.json to Snowflake

-- Create a stage for your dbt metadata
CREATE STAGE IF NOT EXISTS dbt_metadata;

-- Upload the file (via SnowSQL or Snowsight UI)
PUT file://~/dbt_project/target/manifest.json @dbt_metadata/;

Step 2: Open Cortex Code interface in Snowsight

  • Click on the “AI Assistant” or “Cortex Code” button in Snowsight
  • This opens the natural language interface

Step 3: Ask Cortex Code to analyze it

Type this prompt in the Cortex Code interface:

Analyze the manifest.json file in my dbt_metadata stage and identify the top 10 slowest models with specific optimization recommendations. 

Focus on:
- Materialization strategies (table vs incremental)
- Clustering opportunities  
- Complex CTEs that could be simplified
- Join patterns that could be optimized

Provide specific code changes and estimated time savings for each recommendation.

Step 4: Review the analysis

Cortex Code responds with detailed analysis (example of what I got):

Top 10 Slowest Models Analysis:

1. dim_customer (45.2 minutes)
   Issues identified:
   - Full table scan on 50M+ rows daily
   - No clustering key on frequently filtered columns
   - Using table materialization when incremental would work
   - 3 CTEs doing similar aggregations (could be deduplicated)
   
   Recommendations:
   - Add clustering key on (customer_id, updated_date)
   - Convert to incremental materialization
   - Use dbt_utils.deduplicate macro
   - Estimated improvement: 35-40 minutes
   
2. fact_orders_daily (23.7 minutes)
   Issues identified:
   - Window functions not optimized
   - Joining to dim_customer before aggregation
   - Missing partition by date
   
   Recommendations:
   - Aggregate first, then join
   - Add partition_by clause to window functions
   - Consider separate monthly partitions
   - Estimated improvement: 15-18 minutes
   
3. mart_customer_360 (18.4 minutes)
   ...

I was stunned. This analysis would have taken me days to do manually—if I even knew where to start. Cortex Code did it in 30 seconds.


B) Implementing the Recommendations

Let me show you exactly what I did for dim_customer:

Before (45 minutes):

-- models/marts/dim_customer.sql
{{
    config(
        materialized='table'
    )
}}
with customers as (
    select * from {{ ref('stg_customers') }}
),
orders as (
    select * from {{ ref('fct_orders') }}
),
aggregated as (
    select
        c.customer_id,
        c.customer_name,
        c.email,
        c.created_at,
        count(o.order_id) as total_orders,
        sum(o.order_amount) as lifetime_value,
        max(o.order_date) as last_order_date
    from customers c
    left join orders o on c.customer_id = o.customer_id
    group by 1,2,3,4
)
select * from aggregated

After (8 minutes) following Cortex Code suggestions:

Before and after comparison of dbt model performance: 45 minutes reduced to 8 minutes using Cortex Code optimization suggestions
-- models/marts/dim_customer.sql
{{
    config(
        materialized='incremental',
        unique_key='customer_id',
        cluster_by=['customer_id', 'updated_date'],
    )
}}
with customers as (
    select * from {{ ref('stg_customers') }}
    {% if is_incremental() %}
    where updated_date >= (select max(updated_date) from {{ this }})
    {% endif %}
),
orders_aggregated as (
    -- Aggregate BEFORE joining (Cortex suggestion!)
    select
        customer_id,
        count(order_id) as total_orders,
        sum(order_amount) as lifetime_value,
        max(order_date) as last_order_date
    from {{ ref('fct_orders') }}
    {% if is_incremental() %}
    where order_date >= (select max(last_order_date) from {{ this }})
    {% endif %}
    group by customer_id
),
final as (
    select
        c.customer_id,
        c.customer_name,
        c.email,
        c.created_at,
        c.updated_date,
        coalesce(o.total_orders, 0) as total_orders,
        coalesce(o.lifetime_value, 0) as lifetime_value,
        o.last_order_date
    from customers c
    left join orders_aggregated o on c.customer_id = o.customer_id
)
select * from final

Changes made:

  1. ✅ Switched to incremental materialization
  2. ✅ Added clustering keys on customer_id and updated_date
  3. ✅ Aggregated orders before joining (huge win!)
  4. ✅ Added incremental logic to only process new/changed data

Result: 45 minutes → 8 minutes (first run), 3 minutes (incremental runs)


C) run_results.json Deep Dive

The run_results.json file contains actual execution times and metadata from your last dbt run. Even more valuable than manifest for performance debugging.

My weekly performance review process:

-- Upload run_results from this week and last week
PUT file://~/dbt_project/target/run_results.json @my_stage/current/;
PUT file://~/dbt_project_backup/target/run_results.json @my_stage/previous/;

Example output:

Performance Regression Analysis:
CRITICAL REGRESSIONS (>50% slower):
1. mart_sales_summary
   - Previous: 4.2 min
   - Current: 9.8 min (+133%)
   - Root cause: Source table fct_sales grew from 10M to 25M rows
   - Recommendation: Add incremental logic with date partitioning
   
2. dim_product
   - Previous: 2.1 min
   - Current: 5.4 min (+157%)
   - Root cause: New join to external API table (no clustering)
   - Recommendation: Materialize API data first, add clustering key
MODERATE REGRESSIONS (20-50% slower):
3. stg_orders
   - Previous: 1.2 min
   - Current: 1.6 min (+33%)
   - Root cause: New data quality test added (full table scan)
   - Recommendation: Convert test to incremental or sampling
IMPROVEMENTS:
1. dim_customer: 45 min → 8 min (-82%) ✅ [Your optimization worked!]
2. fact_orders_daily: 23 min → 12 min (-48%) ✅
NEW BOTTLENECKS:
- mart_customer_cohort now takes 14 min (wasn't slow before)
- Likely due to dim_customer changes propagating downstream
- Recommendation: Review joins, consider pre-aggregation

This is gold. I immediately know what broke, why, and how to fix it.


D) Automated Performance Audits

I set up a weekly routine every Monday morning using Cortex Code:

My Monday Morning Workflow:

Run my standardized audit prompt

Upload latest manifest and run_results (automated via simple Python script)

Open Cortex Code interface

Perform a comprehensive dbt performance audit using the manifest.json and run_results.json in my dbt_metadata stage:

Analysis needed:
1. Identify slowest 15 models with root cause analysis
2. Detect performance anti-patterns:
   - Models using full refresh that should be incremental
   - Missing clustering keys on large tables  
   - Inefficient join patterns
   - Unnecessary full table scans
3. Find models that should be incremental but aren't
4. Suggest clustering keys based on filter/join patterns in SQL
5. Recommend materialization strategies (table vs view vs incremental)
6. Calculate estimated monthly compute time savings for each recommendation
7. Rank by effort/impact ratio (quick wins vs long-term projects)

Format as prioritized action plan with:
- Quick wins (high impact, <1 hour effort)
- Medium effort items (2-4 hours)  
- Strategic improvements (>4 hours)
- Estimated ROI for each

Sample output from last Monday:

dbt Performance Audit - 2026-01-20
QUICK WINS (High Impact, Low Effort):
1. Add clustering to dim_geography on (country_code, region_id)
   - Current: 6.2 min | Estimated after: 1.5 min | Effort: 5 min
   - Impact: Save 4.7 min per run = 33 hours/month
2. Convert fct_user_sessions to incremental
   - Current: 11.3 min | Estimated after: 2.1 min | Effort: 20 min
   - Impact: Save 9.2 min per run = 65 hours/month
MEDIUM EFFORT, HIGH IMPACT:
3. Refactor mart_product_performance (split into 2 models)
   - Current: 18.7 min | Estimated after: 7 min + 4 min | Effort: 2 hours
   - Impact: Save 7.7 min per run = 54 hours/month
   
ANTI-PATTERNS DETECTED:
⚠️ 8 models using table materialization with full refresh daily (should be incremental)
⚠️ 4 models missing clustering keys despite large table scans
⚠️ 2 models with circular dependency risk
TOTAL ESTIMATED MONTHLY SAVINGS: 247 hours of compute time
TOTAL IMPLEMENTATION EFFORT: ~8 hours
ROI: 30:1

I tackle 2-3 items from this list each week. In three months, I’ve saved 48% of total dbt runtime.


4.3: Real Production Examples

Let me share three actual scenarios where Cortex Code saved my ass:

Example 1: The Mysterious Slowdown

Problem: mart_revenue_daily suddenly went from 5 minutes to 32 minutes. No code changes, just suddenly slow.

What I did:

Analyze this slow dbt model. It suddenly went from 5 minutes to 32 minutes with no code changes.

Context:
- Model name: mart_revenue_daily
- Previous runtime: 5 min
- Current runtime: 32 min  
- No changes to model SQL in past month
- Fact table grew from 50M to 180M rows last week

Here's the compiled SQL from dbt:
[I pasted the full compiled SQL here]

Please:
1. Identify the root cause of slowdown
2. Point to specific problematic lines in the SQL
3. Provide concrete fix with rewritten SQL
4. Estimate performance improvement

Cortex Code’s Response:

Root Cause Identified:

The model joins fact_transactions (now 180M rows) to dim_date without filtering first.

Line 47-52 in your SQL:
```sql
left join {{ ref('dim_date') }} d
    on t.transaction_date = d.date_day
```

With 180M rows, this creates a massive intermediate result set before the WHERE clause on line 68.

Solution:
1. Move the date filter BEFORE the join (filter fact_transactions first)
2. Add clustering key on fact_transactions.transaction_date
3. Consider partitioning by month if queries typically filter by date range

Estimated improvement: 32 min → 6 min

Implementation:

-- Changed from this:
from {{ ref('fact_transactions') }} t
left join {{ ref('dim_date') }} d on t.transaction_date = d.date_day
where d.fiscal_year = 2024

-- To this:
from (
    select * from {{ ref('fact_transactions') }}
    where transaction_date >= '2024-01-01'
) t
left join {{ ref('dim_date') }} d on t.transaction_date = d.date_day

Plus added clustering:

{{ config(cluster_by=['transaction_date']) }}

Result: 32 minutes → 5.2 minutes. Problem solved in 20 minutes total.


Example 2: Pre-Launch Optimization

Scenario: Building a new mart_customer_churn_prediction model. Wanted to optimize it BEFORE the first production run.

I'm about to deploy this new dbt model to production. Please review and suggest optimizations BEFORE I run it for the first time.

Model: mart_customer_churn_prediction

Focus areas:
- Best materialization strategy (table, view, or incremental)
- Clustering key recommendations
- Potential bottlenecks in the SQL
- Incremental logic if applicable
- Query optimization opportunities

Here's the model SQL:
[pasted my model code]

Expected data volume: 50M+ customer records
Update frequency: Daily
Query pattern: Filtered by customer_segment and churn_date

Cortex Code’s Pre-Production Analysis:

Pre-Production Optimization Analysis:

CRITICAL ISSUES:
1. ❌ Using table materialization for 50M+ row result
   Recommendation: Use incremental with unique_key on customer_id
   
2. ❌ No clustering key despite filtering on customer_segment and churn_date
   Recommendation: cluster_by=['customer_segment', 'churn_date']
   
3. ❌ Complex CTE chain (7 CTEs) doing full table scans
   Recommendation: Break into separate intermediate models

PERFORMANCE PREDICTIONS:
- As written: Estimated 45-60 minutes first run
- With optimizations: Estimated 8-12 minutes first run, 2-3 min incremental

SUGGESTED REFACTOR:
Split into 3 models:
1. int_customer_metrics (intermediate, incremental)
2. int_churn_features (intermediate, incremental)  
3. mart_customer_churn_prediction (mart, incremental)

I implemented all suggestions. First production run: 9.3 minutes instead of the estimated 45-60.


Example 3: Monthly Performance Review

Every month, I do a comprehensive audit:

Step 1: Collect all metadata files

# My automation script copies these
cp ~/dbt_project/target/manifest.json ~/monthly_audits/2026-01/
cp ~/dbt_project/target/run_results.json ~/monthly_audits/2026-01/

Step 2: Upload to Snowflake

PUT file://~/monthly_audits/2026-01/* @dbt_metadata/monthly/2026-01/;

Step 3: Open Cortex Code and run monthly audit

Monthly dbt Performance Review - January 2026

Using files in dbt_metadata/monthly/2026-01/:
- manifest.json 
- run_results.json

Provide comprehensive analysis:

1. HEALTH METRICS
   - Overall project health score (0-100)
   - Total models and average runtime
   - Percentage using best practices (incremental, clustering)
   - Month-over-month performance trend

2. TOP ISSUES  
   - 10 slowest models with root cause
   - Performance anti-patterns detected
   - Models that grew disproportionately  
   - Technical debt items

3. CLEANUP OPPORTUNITIES
   - Unused or rarely-run models
   - Outdated materializations
   - Redundant transformations
   - Models that can be archived

4. OPTIMIZATION ROADMAP
   - Week-by-week action plan for next month
   - Quick wins vs strategic improvements
   - Estimated time savings and effort required
   - Projected end-of-month performance

5. ROI CALCULATIONS
   - Current monthly compute cost
   - Potential savings from recommendations
   - Effort/impact ratio for each item

January 2026 Audit Output:

dbt Project Health Score: 73/100 (Up from 61 last month)

PERFORMANCE SUMMARY:
- Total models: 147
- Average model runtime: 3.2 min (down from 5.1 min)
- Slowest model: dim_customer_360 (14.2 min)
- Models using incremental: 67% (target: 80%)
- Models with clustering: 45% (target: 70%)

TOP 10 ISSUES:
1. dim_customer_360 (14.2 min) - needs incremental + clustering
2. mart_sales_forecast (12.8 min) - complex window functions, consider simplification
3. fct_website_sessions (11.4 min) - full refresh daily, should be incremental
...

OPTIMIZATION ROADMAP - FEBRUARY 2026:
Week 1: Add clustering to 8 identified models (est. save 45 min/run)
Week 2: Convert 6 models to incremental (est. save 67 min/run)
Week 3: Refactor mart_sales_forecast (est. save 8 min/run)
Week 4: Remove 4 unused models identified

Projected end-of-month runtime: 58 minutes (current: 83 minutes)

Following this roadmap, I hit 61 minutes by month-end.


4.4: My Daily Workflow with Cortex Code

Here’s how Cortex Code fits into my actual workday:

Monday Morning (9:00 AM) – Weekly Review:

  1. Upload latest manifest.json and run_results.json
  2. Run performance audit
  3. Create Jira tickets for top 3 optimization opportunities
  4. Prioritize for the week

Tuesday-Thursday – Development:

  1. Need a new model?
    • Ask Cortex Code to generate boilerplate
    • Review and customize for business logic
    • Ask Cortex to optimize before first run
  2. Model running slow?
    • Share compiled SQL with Cortex
    • Get optimization suggestions
    • Implement and test
Weekly data engineering workflow integrating Snowflake Cortex Code for dbt optimization and development

Friday Afternoon – Cleanup:

  1. Review week’s changes in dbt
  2. Ask Cortex to review my new models for anti-patterns
  3. Generate documentation with Cortex assistance
  4. Prepare for Monday’s review

Time saved per week:

  • Before: 8-10 hours on performance debugging
  • After: 1-2 hours on Cortex-assisted optimization
  • Net savings: 6-8 hours weekly

4.5: Prompts That Actually Work

Here are my most-used prompts, copy-paste ready:

Performance Analysis:

"Analyze this manifest.json and identify the top 10 slowest models with specific, actionable optimization recommendations ranked by estimated time savings."
"Compare these two run_results.json files (last week vs this week) and identify performance regressions, improvements, and new bottlenecks. Prioritize by impact."
"This model runs in X minutes. Here's the compiled SQL: [paste]. Provide optimization suggestions with estimated impact for each."

Model Optimization:

"Review this dbt model and suggest: 1) Best materialization strategy, 2) Clustering keys, 3) Incremental logic if applicable, 4) Query optimizations. Model: [paste]"
"I'm building a new model for [business purpose]. Suggest optimal dbt structure including staging, intermediate, and mart layers with proper materializations."

Debugging:

"This dbt model suddenly got slow. Root cause analysis based on: Compiled SQL: [paste], Recent changes: [describe], Data volume changes: [numbers]"
"Why is this incremental model doing full refreshes? Model config: [paste], Logs: [paste]"

Ongoing Monitoring:

"Monthly dbt health audit. Analyze manifest + run_results. Provide: health score, top 10 issues, optimization roadmap. Files: [paste]"
"Identify unused or rarely-run models in this manifest that could be archived. Criteria: run less than once per week, not referenced by marts."

4.6: What Works vs. What Doesn’t

After 3 months of daily use, here’s my honest assessment:

What Works Exceptionally Well (9-10/10):

Manifest.json analysis – Unbelievably accurate

  • Finds bottlenecks I’d never spot manually
  • Prioritizes by actual impact
  • Estimates are within 20% of reality
Visual comparison of Snowflake Cortex Code strengths and limitations for dbt optimization

Performance regression detection – Catches issues immediately

  • Week-over-week comparisons are spot-on
  • Identifies root causes correctly 90% of the time

Clustering key recommendations – Based on real query patterns

  • Suggestions almost always improve performance
  • Understands join patterns and filter predicates

Materialization strategy advice – Knows when to use incremental vs table

  • Factors in data volume, update frequency, query patterns

Boilerplate generation – Saves tons of typing

  • Staging models, tests, yml files
  • Follows dbt best practices

What’s Good But Needs Review (7-8/10):

⚠️ Macro generation – Often correct but review logic carefully

  • Sometimes over-complicates simple macros
  • Jinja syntax is usually right, logic sometimes questionable

⚠️ Incremental logic – Usually good starting point

  • Test thoroughly before production
  • Edge cases might not be covered
  • Deduplication logic needs validation

⚠️ Complex transformations – Can over-engineer

  • Tend to add unnecessary CTEs
  • Sometimes creates cleverness over clarity

What Doesn’t Work Well (4-6/10):

Understanding specific business context – It’s AI, not a domain expert

  • Doesn’t know your business rules
  • Can’t infer data quality requirements
  • Might suggest technically sound but business-wrong logic

Data distribution insights – Can’t see actual data

  • Clustering suggestions are pattern-based, not data-based
  • Doesn’t know your data skew or cardinality

Cost optimization – Focuses on time, not cost

  • Doesn’t factor in warehouse sizing
  • Might suggest compute-expensive solutions

Complex dependencies – Struggles with very large DAGs

  • Can get confused with 200+ model projects
  • Recommendations might create circular dependencies

Critical: What You Must Validate:

🔴 Always manually verify:

  1. Incremental logic (especially deduplication)
  2. Business logic in transformations
  3. Data quality test logic
  4. Macro behavior with edge cases
  5. Performance impact in production (not just estimated)

4.7: Real Numbers from My Experience

Let me share the actual metrics that matter:

Before Cortex Code (December 2025):

dbt Performance:

  • Full refresh runtime: 2h 47min
  • Incremental runtime: 1h 15min
  • Models with clustering: 12/147 (8%)
  • Models using incremental: 42/147 (29%)
  • Airflow timeout failures: 2-3/week

My Time Spent:

  • Performance debugging: 8-10 hours/week
  • Manual manifest review: Never (too tedious)
  • Optimization work: Ad-hoc, reactive
  • New model development: 45-60 min per model

Costs:

  • Snowflake compute (dbt): ~$1,200/month
  • Airflow retries/failures: ~$180/month
  • My time opportunity cost: Unmeasured but significant

After 3 Months with Cortex Code (March 2026):

dbt Performance:

  • Full refresh runtime: 1h 23min (-50%)
  • Incremental runtime: 34min (-55%)
  • Models with clustering: 67/147 (46%)
  • Models using incremental: 99/147 (67%)
  • Airflow timeout failures: 1-2/month

My Time Spent:

  • Performance debugging: 1-2 hours/week (-85%)
  • Weekly manifest review: 15 min (automated with Cortex)
  • Optimization work: Systematic, proactive
  • New model development: 15-20 min per model (-67%)

Costs:

  • Snowflake compute (dbt): ~$680/month (-43%)
  • Airflow retries/failures: ~$35/month (-81%)
  • My time regained: 6-8 hours/week for feature development

ROI Calculation:

Cost of Cortex Code: Included in Snowflake Enterprise+ (no additional cost)

Monthly savings:

  • Snowflake compute: $520
  • Airflow compute: $145
  • My time (40 hours/month @ $75/hour): $3,000
  • Total: $3,665/month

Annual ROI: $43,980 in quantifiable savings

Plus unquantifiable benefits:

  • Faster feature delivery
  • Better data quality (more time for testing)
  • Reduced incidents and pages
  • Improved team morale (less firefighting)

Part 5: Quick Wins – Other Tools

While dbt is where I get the most value, Cortex Code is useful for other data engineering tools too:

5.1: Airflow DAGs

Quick generation using Cortex Code interface:

Generate a production-ready Airflow DAG for my daily dbt run:

Requirements:
- Schedule: Run daily at 2 AM UTC
- Retry logic: 3 attempts with 5-minute delay between retries
- Send Slack alert on failure (webhook already configured)
- Trigger downstream Tableau refresh on success
- Include SLA monitoring (alert if takes >90 minutes)
- Use dbt Cloud operator
- Handle task dependencies properly

Python version: 3.9
Airflow version: 2.7

Cortex Code generates:

from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.operators.python import PythonOperator
from airflow.providers.slack.notifications.slack import send_slack_notification
from datetime import datetime, timedelta

default_args = {
    'owner': 'data-engineering',
    'depends_on_past': False,
    'email_on_failure': False,
    'retries': 3,
    'retry_delay': timedelta(minutes=5),
    'sla': timedelta(minutes=90),
}

dag = DAG(
    'daily_dbt_run',
    default_args=default_args,
    description='Daily dbt production run',
    schedule='0 2 * * *',  # 2 AM UTC daily
    start_date=datetime(2026, 1, 1),
    catchup=False,
        text='❌ dbt DAG failed!',
        channel='#data-alerts'
    ),
)

dbt_run = BashOperator(
    task_id='dbt_run',
    bash_command='cd /opt/dbt && dbt run --profiles-dir .',
    dag=dag,
)

dbt_test = BashOperator(
    task_id='dbt_test',  
    bash_command='cd /opt/dbt && dbt test --profiles-dir .',
    dag=dag,
)

refresh_tableau = BashOperator(
    task_id='refresh_tableau',
    bash_command='curl -X POST https://tableau-server/api/refresh',
    dag=dag,
)

dbt_run >> dbt_test >> refresh_tableau

Gets you 80% of the way there in 30 seconds vs. 30 minutes of boilerplate typing.

5.2: Streamlit Dashboards

Dashboard scaffolding with Cortex Code:

Create a Streamlit dashboard for sales analytics:

Features needed:
- Date range filter (last 7, 30, 90 days or custom)
- KPI cards showing: Total Revenue, Total Orders, Average Order Value
- Line chart: Daily revenue trend over selected period
- Bar chart: Top 10 products by revenue
- Data table: Recent orders with filters
- Connect to Snowflake table: analytics.mart_daily_sales
- Use Streamlit's modern layout with columns
- Include data refresh timestamp
- Add export to CSV button

Make it production-ready with proper error handling.

Cortex Code generates:

import streamlit as st
import snowflake.connector
import pandas as pd
import plotly.express as px
from datetime import datetime, timedelta

st.set_page_config(page_title="Sales Analytics", layout="wide")

# Snowflake connection
@st.cache_resource
def get_connection():
    return snowflake.connector.connect(
        user=st.secrets["snowflake"]["user"],
        password=st.secrets["snowflake"]["password"],
        account=st.secrets["snowflake"]["account"],
        warehouse='ANALYTICS_WH',
        database='ANALYTICS',
        schema='MART'
    )

# Main dashboard
st.title("📊 Sales Analytics Dashboard")

# Date filter
col1, col2 = st.columns([3, 1])
with col1:
    date_range = st.selectbox(
        "Select Period",
        ["Last 7 Days", "Last 30 Days", "Last 90 Days", "Custom"]
    )

# ... [rest of dashboard code]

Generates working dashboard layout. Still need to refine styling and business logic, but saves 45 minutes of setup.

(Keeping these brief since the focus is dbt – but both are genuinely useful)


Part 6: Practical Tips for Data Engineers

The Learning Curve

Week 1: Feels magical but you don’t trust it

  • Generate code, read every line carefully
  • Validate everything in dev
  • Probably slower than writing manually

Week 2-4: Building confidence

  • Start recognizing patterns in good vs. questionable output
  • Develop your own prompt templates
  • 20-30% faster than before

Month 2+: It’s part of your workflow

  • Know when to use it vs. when to write manually
  • Can spot hallucinations immediately
  • 50-70% faster on routine tasks

My Validation Checklist

Before deploying Cortex-generated code:

✅ Logic review: Does this make business sense?
✅ Performance check: Run EXPLAIN on generated SQL
✅ Edge cases: Test with null values, duplicates, empty sets
✅ Incremental logic: Validate deduplication and update logic
✅ Dependencies: Check for circular references
✅ Tests: Generated code needs generated tests
✅ Peer review: Treat AI code like any other PR

When I Don’t Use Cortex Code

Never use for:

  • Financial calculations (too critical, audit requirements)
  • Security/access control logic (review manually)
  • One-off analyses (faster to write myself)
  • Learning new concepts (defeats the learning purpose)

Sometimes use for:

  • Debugging (helpful but verify root cause)
  • Refactoring (good starting point, heavy review)
  • Documentation (generates good drafts)

Always use for:

  • Boilerplate (staging models, tests, yml)
  • Performance analysis (manifest reviews)
  • Exploration (trying new patterns)

Part 7: The Honest Verdict

For dbt Specifically:

Model Generation: 8/10

  • Great for standard patterns
  • Saves typing, enforces conventions
  • Still need to add business logic

Test Creation: 9/10

  • Covers standard tests well
  • Good at identifying what to test
  • Custom tests need review

Manifest Analysis: 10/10 ⭐⭐⭐

  • This alone justifies using Cortex Code
  • Finds issues I’d never spot manually
  • Actionable, prioritized recommendations

Performance Optimization: 9/10

  • Suggestions are usually right
  • Massive time savings
  • Estimates are reasonably accurate

Macro Writing: 7/10

  • Good starting point
  • Logic sometimes over-complicated
  • Requires Jinja knowledge to review properly

Documentation: 8/10

  • Generates good yml drafts
  • Descriptions are generic but fixable
  • Saves tons of tedious typing

Overall Assessment:

Is Cortex Code worth it for data engineers?

Absolutely yes, with caveats:

Use it if you:

  • Work with dbt daily
  • Have performance challenges
  • Want to spend less time on boilerplate
  • Value systematic optimization over guesswork
  • Are comfortable reviewing and validating AI output

⚠️ Be cautious if you:

  • Are still learning dbt (use it, but understand what it generates)
  • Have highly specialized/unusual patterns
  • Work in heavily regulated industry (extra validation needed)
  • Have very small dbt projects (<20 models – manual is fine)

Skip it if you:

  • Don’t have Snowflake Enterprise+
  • Rarely write dbt code
  • Prefer full manual control (totally valid!)

The Real Value Proposition

It’s not about writing code faster (though that’s nice).

It’s about:

  1. Systematic performance optimization instead of guesswork
  2. Proactive monitoring instead of reactive firefighting
  3. Data-driven decisions about what to optimize
  4. Consistent code quality through enforced best practices
  5. More time for high-value work instead of debugging

My Recommendation

Start small:

  1. Week 1: Try manifest analysis only
  2. Week 2: Generate a few staging models
  3. Week 3: Use for performance debugging
  4. Week 4: Incorporate into daily workflow

By month 2, you’ll wonder how you lived without it.


Conclusion: The Tool That Changed My Workflow

Three months ago, I was drowning in performance issues, spending my days debugging slow dbt models and my nights fixing Airflow timeouts.

Today, my dbt runs 48% faster, I spend 85% less time on performance debugging, and I actually have time to build new features instead of constantly firefighting.

Cortex Code didn’t just make me faster—it made me smarter about optimization. The manifest analysis taught me patterns I now recognize manually. The performance suggestions showed me best practices I’d never considered.

Is it perfect? No. Does it replace data engineering expertise? Definitely not. But used correctly, with proper validation and critical thinking, it’s become as essential to my workflow as dbt itself.

If you’re a data engineer using Snowflake and dbt, try the manifest analysis feature today. Upload your manifest.json, ask for performance recommendations, and see what it finds. I bet you’ll be shocked—I was.

And if you do try it, let me know what you discover. I’m always curious what performance wins other engineers are finding.

Now go optimize something. Your Airflow DAG will thank you.


Additional Resources

Snowflake Documentation:


FAQ

Q: Does Cortex Code work with dbt Cloud or just dbt Core? A: Works with both! It analyzes manifest.json regardless of how dbt runs.

Q: How much does Cortex Code cost? A: Included with Snowflake Enterprise Edition and higher. No additional charge.

Q: Can it analyze very large dbt projects (500+ models)? A: Yes, though response time increases. I’ve tested up to 300 models successfully.

Q: Does it send my code/data to external APIs? A: No. Cortex Code runs entirely within Snowflake’s environment.

Q: How often should I run performance audits? A: I do weekly quick checks, monthly comprehensive audits.