Build a Meeting Notes AI Assistant in Snowflake

Build an AI-powered meeting intelligence system in Snowflake. Search meeting notes with natural language using Cortex RAG and Streamlit.

The Problem We All Face (And Nobody Talks About)

You know that feeling when someone asks “What did we decide about the API redesign?” and you’re frantically scrolling through three weeks of meeting notes trying to find that one conversation?

Or when your manager asks “What action items were assigned to the engineering team last month?” and you realize those decisions are buried across 47 different meeting transcripts scattered in Google Docs, Notion, and email threads?

Yeah, we’ve all been there.

Here’s the thing: companies have meetings. Lots of them. And every single meeting contains valuable information—decisions made, problems discussed, action items assigned, ideas shared. But all that knowledge just… disappears into the void of meeting notes that nobody reads again.

Until now.

What if we could build a system where you just ask “What did we decide about the database migration?” and get an instant answer with exact sources? Not some generic corporate search that returns 500 irrelevant documents, but an actual intelligent assistant that understands context.

That’s exactly what we can build with Snowflake Cortex. And honestly? It’s simpler than you think.

What We’re Building (In Plain English)

Before diving into code, let’s understand what a Meeting Notes RAG actually does:

RAG = Retrieval Augmented Generation

Think of it like this:

Store all your meeting notes in Snowflake
Convert them into a format that AI can search semantically (not just keyword matching)
When someone asks a question, find the most relevant meeting notes
Use an AI model to generate an intelligent answer based on those notes
Show sources so people can verify information

Real example:

Question: “Why did we choose PostgreSQL over MySQL?”
System finds: Engineering meeting from March 15th discussing database options
Answer: “According to the March 15th engineering meeting, the team chose PostgreSQL primarily because of better JSON support and more robust handling of concurrent writes. Sarah from backend also mentioned PostgreSQL’s superior full-text search capabilities.”

See? Not just “here are 50 documents mentioning PostgreSQL” but an actual synthesized answer with context.

Why Snowflake for This?

You might be thinking: “Can’t I just use ChatGPT with my notes?”

Sure, but here’s why Snowflake is better for this:

Security: Meeting notes contain sensitive information. With Snowflake, data never leaves your secure environment
Scale: Got 10,000 meetings? No problem. Snowflake handles it easily
Integration: Your meeting data might already be in Snowflake, or easily pipeable
Cortex Search: Built-in vector search—no need for external vector databases
SQL interface: Everyone on your team can query it without learning new tools
Cost: Pay only for what you use, no separate infrastructure

Plus, there’s something elegant about keeping everything in one place.

The Architecture (Keep It Simple)

Here’s how we can structure this:

Meeting Notes (Zoom, Teams, Google Meet transcripts)
                    ↓
        Load into Snowflake table
                    ↓
          Split into chunks
                    ↓
      Generate vector embeddings
                    ↓
     Create Cortex Search Service
                    ↓
User asks question → Find relevant chunks → Generate answer with sources

No microservices, no Kubernetes, no headaches. Just Snowflake doing what it does best.

Step 1: Setting Up the Foundation

First, let’s create a proper database structure. We’re organizing this so it’s maintainable and scalable.

-- Create dedicated database for meeting intelligence
CREATE DATABASE IF NOT EXISTS meeting_intelligence;
USE DATABASE meeting_intelligence;

-- Create schema for organization
CREATE SCHEMA IF NOT EXISTS meetings;
USE SCHEMA meetings;

-- Set up compute
CREATE WAREHOUSE IF NOT EXISTS meeting_rag_wh
WITH 
    WAREHOUSE_SIZE = 'SMALL'
    AUTO_SUSPEND = 60
    AUTO_RESUME = TRUE
    INITIALLY_SUSPENDED = TRUE;

USE WAREHOUSE meeting_rag_wh;

-- Ensure we have Cortex access
-- (Account admin needs to grant this)
-- GRANT DATABASE ROLE SNOWFLAKE.CORTEX_USER TO ROLE your_role_name;

Nothing fancy here—just clean organization. A SMALL warehouse is perfectly fine for this; we can always scale if needed.

Step 2: Designing the Meeting Notes Table

This is where thoughtful schema design matters. We want to capture not just the content, but useful metadata.

-- Main table for meeting transcripts
CREATE OR REPLACE TABLE meeting_transcripts (
    meeting_id STRING PRIMARY KEY,
    meeting_title STRING NOT NULL,
    meeting_date TIMESTAMP_LTZ NOT NULL,
    meeting_type STRING, -- 'standup', 'planning', 'retrospective', 'one-on-one'
    attendees ARRAY,
    duration_minutes INTEGER,
    transcript_text STRING NOT NULL,
    action_items ARRAY,
    decisions_made ARRAY,
    topics_discussed ARRAY,
    recording_url STRING,
    created_at TIMESTAMP_LTZ DEFAULT CURRENT_TIMESTAMP(),
    updated_at TIMESTAMP_LTZ DEFAULT CURRENT_TIMESTAMP()
);

-- Add some realistic sample data
INSERT INTO meeting_transcripts VALUES
(
    'MTG-2024-001',
    'Q1 Product Planning',
    '2024-01-15 10:00:00',
    'planning',
    ARRAY_CONSTRUCT('Sarah Chen', 'Mike Rodriguez', 'Emily Watson', 'David Park'),
    60,
    'Sarah opened the meeting discussing Q1 priorities. The main focus is launching the new dashboard feature by end of February. Mike raised concerns about the API rate limiting affecting customer experience. After discussion, the team decided to implement a caching layer using Redis before launch. Emily suggested we should also add analytics to track dashboard load times. David mentioned the infrastructure team can provision the Redis cluster within a week. Action items: Mike to create Redis implementation plan, Emily to design analytics dashboard, David to provision infrastructure by Jan 22.',
    ARRAY_CONSTRUCT(
        'Mike: Create Redis implementation plan by Jan 18',
        'Emily: Design analytics dashboard mockup by Jan 20',
        'David: Provision Redis cluster by Jan 22'
    ),
    ARRAY_CONSTRUCT(
        'Implement Redis caching layer before dashboard launch',
        'Add analytics tracking for dashboard performance',
        'Q1 launch date confirmed for Feb 28'
    ),
    ARRAY_CONSTRUCT('Dashboard launch', 'API performance', 'Redis caching', 'Analytics'),
    'https://zoom.us/rec/share/mock-url-001',
    CURRENT_TIMESTAMP(),
    CURRENT_TIMESTAMP()
),
(
    'MTG-2024-002',
    'Database Migration Discussion',
    '2024-01-22 14:00:00',
    'technical',
    ARRAY_CONSTRUCT('Tom Liu', 'Sarah Chen', 'Alex Kumar'),
    45,
    'Tom presented three options for the database migration: PostgreSQL, MySQL, and staying with current setup. Sarah advocated strongly for PostgreSQL citing better JSON support and more robust concurrent write handling. Alex agreed, mentioning PostgreSQL full-text search capabilities would be beneficial for the search feature we are planning. The team also discussed migration timeline - estimated 3 weeks for full migration including testing. Concern raised about downtime, but Tom confirmed we can do blue-green deployment with minimal disruption. Decision: Move forward with PostgreSQL, start migration planning next week.',
    ARRAY_CONSTRUCT(
        'Tom: Create detailed migration plan by Jan 29',
        'Sarah: Review PostgreSQL best practices documentation',
        'Alex: Set up staging PostgreSQL environment by Feb 1'
    ),
    ARRAY_CONSTRUCT(
        'Selected PostgreSQL as new database',
        'Migration timeline: 3 weeks',
        'Use blue-green deployment strategy'
    ),
    ARRAY_CONSTRUCT('Database migration', 'PostgreSQL', 'Blue-green deployment'),
    'https://zoom.us/rec/share/mock-url-002',
    CURRENT_TIMESTAMP(),
    CURRENT_TIMESTAMP()
),
(
    'MTG-2024-003',
    'API Security Review',
    '2024-02-05 11:00:00',
    'security',
    ARRAY_CONSTRUCT('Jennifer Lee', 'Tom Liu', 'Mike Rodriguez'),
    90,
    'Jennifer led security audit of our API endpoints. Critical finding: several endpoints lack proper rate limiting, making them vulnerable to DDoS attacks. Mike explained current authentication uses JWT but token expiry is set too long at 7 days. Team agreed to reduce to 24 hours and implement refresh token mechanism. Jennifer recommended adding API key rotation policy every 90 days. Also discussed implementing request signing for sensitive endpoints. Tom mentioned we should add monitoring alerts for unusual API patterns. Decision: Implement all recommendations before March 1st launch. This is now a blocker for product launch.',
    ARRAY_CONSTRUCT(
        'Mike: Implement rate limiting on all API endpoints by Feb 12',
        'Tom: Set up monitoring and alerts by Feb 15',
        'Jennifer: Document API key rotation policy by Feb 10',
        'Mike: Reduce JWT expiry to 24h and add refresh tokens by Feb 16'
    ),
    ARRAY_CONSTRUCT(
        'All API security improvements are launch blockers',
        'JWT token expiry reduced from 7 days to 24 hours',
        'Implement 90-day API key rotation policy',
        'Add request signing for sensitive endpoints'
    ),
    ARRAY_CONSTRUCT('API security', 'Rate limiting', 'JWT tokens', 'Authentication'),
    'https://zoom.us/rec/share/mock-url-003',
    CURRENT_TIMESTAMP(),
    CURRENT_TIMESTAMP()
),
(
    'MTG-2024-004',
    'Weekly Engineering Standup',
    '2024-02-12 09:00:00',
    'standup',
    ARRAY_CONSTRUCT('Sarah Chen', 'Mike Rodriguez', 'Emily Watson', 'David Park', 'Tom Liu'),
    30,
    'Quick updates from everyone. Sarah completed the dashboard analytics implementation, looking good in staging. Mike finished API rate limiting yesterday, currently doing load testing. Emily working on mobile responsive design, encountered some CSS issues with the new dashboard on tablets. David reports Redis cluster is stable, handling 10k requests per second easily. Tom mentioned database migration testing is going well, planning cutover for next weekend. No blockers reported. Brief discussion about whether we need additional caching for mobile API endpoints - agreed to monitor after launch and optimize if needed.',
    ARRAY_CONSTRUCT(
        'Emily: Fix tablet responsive issues by Feb 14',
        'Mike: Complete load testing report by Feb 13',
        'Tom: Finalize migration cutover plan by Feb 14'
    ),
    ARRAY_CONSTRUCT(),
    ARRAY_CONSTRUCT('Sprint progress', 'Dashboard status', 'Mobile responsive', 'Database migration'),
    'https://zoom.us/rec/share/mock-url-004',
    CURRENT_TIMESTAMP(),
    CURRENT_TIMESTAMP()
),
(
    'MTG-2024-005',
    'Customer Feedback Review',
    '2024-02-20 15:00:00',
    'product',
    ARRAY_CONSTRUCT('Emily Watson', 'Sarah Chen', 'Product Manager Jane Smith'),
    60,
    'Jane presented findings from customer interviews conducted last week. Top request: ability to export dashboard data to Excel. 15 out of 20 customers mentioned this. Also strong demand for custom date range filters, currently we only support preset ranges. Sarah noted export feature is technically straightforward, could be done in one sprint. Emily suggested we should also add PDF export option. Jane agreed, mentioned several enterprise customers specifically asked for PDF reports for presentations. Team consensus: prioritize Excel export for Q2, PDF can be Q3. Also discussed adding keyboard shortcuts for power users - Emily volunteered to research what shortcuts would be most useful.',
    ARRAY_CONSTRUCT(
        'Sarah: Create technical spec for Excel export by Feb 27',
        'Emily: Research and propose keyboard shortcuts by Feb 29',
        'Jane: Follow up with customers about custom date ranges by Mar 1'
    ),
    ARRAY_CONSTRUCT(
        'Excel export prioritized for Q2 Sprint 1',
        'PDF export moved to Q3',
        'Custom date range filters to be designed',
        'Keyboard shortcuts under consideration'
    ),
    ARRAY_CONSTRUCT('Customer feedback', 'Feature requests', 'Excel export', 'Dashboard improvements'),
    'https://zoom.us/rec/share/mock-url-005',
    CURRENT_TIMESTAMP(),
    CURRENT_TIMESTAMP()
);

-- Verify data loaded
SELECT 
    meeting_id,
    meeting_title,
    meeting_date,
    ARRAY_SIZE(attendees) as attendee_count,
    ARRAY_SIZE(action_items) as action_count,
    LEFT(transcript_text, 100) || '...' as preview
FROM meeting_transcripts
ORDER BY meeting_date;

Notice how we’re capturing structured data (action items, decisions, topics) alongside unstructured text. This dual approach gives us flexibility in how we search and analyze later.

Step 3: Chunking Strategy for Meetings

Unlike documentation, meetings have natural structure—they flow chronologically. We can chunk intelligently based on topics or time segments.


-- Table for meeting chunks (optimized for retrieval)
CREATE OR REPLACE TABLE meeting_chunks (
    chunk_id STRING PRIMARY KEY,
    meeting_id STRING,
    chunk_index INTEGER,
    chunk_text STRING,
    chunk_size INTEGER,
    chunk_metadata VARIANT,
    chunk_embedding VECTOR(FLOAT, 1024)
);

-- Use SPLIT_TEXT_RECURSIVE_CHARACTER for intelligent chunking
-- This function preserves sentence boundaries and context
INSERT INTO meeting_chunks (chunk_id, meeting_id, chunk_index, chunk_text, chunk_size, chunk_metadata)
SELECT 
    meeting_id || '_chunk_' || chunk_index as chunk_id,
    meeting_id,
    chunk_index,
    chunk_text,
    LENGTH(chunk_text) as chunk_size,
    OBJECT_CONSTRUCT(
        'meeting_title', meeting_title,
        'meeting_date', meeting_date,
        'meeting_type', meeting_type,
        'attendees', attendees,
        'duration_minutes', duration_minutes,
        'has_action_items', ARRAY_SIZE(action_items) > 0,
        'has_decisions', ARRAY_SIZE(decisions_made) > 0,
        'topics', topics_discussed,
        'total_chunks', total_chunks_in_meeting
    ) as chunk_metadata
FROM (
    SELECT 
        mt.meeting_id,
        mt.meeting_title,
        mt.meeting_date,
        mt.meeting_type,
        mt.attendees,
        mt.duration_minutes,
        mt.action_items,
        mt.decisions_made,
        mt.topics_discussed,
        chunk.value::STRING as chunk_text,
        chunk.index as chunk_index,
        ARRAY_SIZE(chunks_array) as total_chunks_in_meeting
    FROM meeting_transcripts mt,
    LATERAL (
        -- SPLIT_TEXT_RECURSIVE_CHARACTER parameters:
        -- text: the content to split
        -- max_characters: target size per chunk (500-1000 works well)
        SELECT SNOWFLAKE.CORTEX.SPLIT_TEXT_RECURSIVE_CHARACTER(
            mt.transcript_text,
            500  -- Target 500 characters per chunk
        ) as chunks_array
    ) split_result,
    LATERAL FLATTEN(input => chunks_array) chunk
);

-- Verify what we created
SELECT 
    meeting_id,
    chunk_metadata:meeting_title::STRING as meeting_title,
    COUNT(*) as num_chunks,
    AVG(chunk_size) as avg_chunk_size,
    MIN(chunk_size) as min_chunk_size,
    MAX(chunk_size) as max_chunk_size
FROM meeting_chunks
GROUP BY meeting_id, chunk_metadata:meeting_title::STRING
ORDER BY chunk_metadata:meeting_date::TIMESTAMP_LTZ;

-- View actual chunks for one meeting
SELECT 
    chunk_id,
    chunk_index,
    chunk_size,
    LEFT(chunk_text, 150) || '...' as chunk_preview
FROM meeting_chunks
WHERE meeting_id = 'MTG-2024-002'  -- Database Migration Discussion
ORDER BY chunk_index;

-- ========================================
-- ADAPTIVE CHUNKING (Optional Enhancement)
-- ========================================

-- Adjust chunk size based on meeting duration
TRUNCATE TABLE meeting_chunks;

INSERT INTO meeting_chunks (chunk_id, meeting_id, chunk_index, chunk_text, chunk_size, chunk_metadata)
SELECT 
    meeting_id || '_chunk_' || chunk_index as chunk_id,
    meeting_id,
    chunk_index,
    chunk_text,
    LENGTH(chunk_text) as chunk_size,
    chunk_metadata
FROM (
    SELECT 
        mt.meeting_id,
        chunk.value::STRING as chunk_text,
        chunk.index as chunk_index,
        OBJECT_CONSTRUCT(
            'meeting_title', mt.meeting_title,
            'meeting_date', mt.meeting_date,
            'meeting_type', mt.meeting_type,
            'attendees', mt.attendees,
            'duration_minutes', mt.duration_minutes,
            'has_action_items', ARRAY_SIZE(mt.action_items) > 0,
            'has_decisions', ARRAY_SIZE(mt.decisions_made) > 0,
            'topics', mt.topics_discussed
        ) as chunk_metadata
    FROM meeting_transcripts mt,
    LATERAL (
        SELECT SNOWFLAKE.CORTEX.SPLIT_TEXT_RECURSIVE_CHARACTER(
            mt.transcript_text,
            CASE 
                -- Short meetings: larger chunks
                WHEN mt.duration_minutes <= 30 THEN 800
                -- Medium meetings: standard chunks
                WHEN mt.duration_minutes <= 60 THEN 500
                -- Long meetings: smaller chunks for precision
                ELSE 400
            END
        ) as chunks_array
    ) split_result,
    LATERAL FLATTEN(input => chunks_array) chunk
);

-- ========================================
-- ENHANCED CHUNKING PROCEDURE (Production)
-- ========================================

-- This creates separate chunks for actions and decisions
CREATE OR REPLACE PROCEDURE chunk_meeting_intelligently(p_meeting_id STRING)
RETURNS STRING
LANGUAGE SQL
AS
$$
BEGIN
    -- First, chunk the main transcript
    INSERT INTO meeting_chunks (chunk_id, meeting_id, chunk_index, chunk_text, chunk_size, chunk_metadata)
    SELECT 
        p_meeting_id || '_main_' || chunk.index as chunk_id,
        p_meeting_id as meeting_id,
        chunk.index as chunk_index,
        chunk.value::STRING as chunk_text,
        LENGTH(chunk.value::STRING) as chunk_size,
        OBJECT_CONSTRUCT(
            'meeting_title', mt.meeting_title,
            'meeting_date', mt.meeting_date,
            'meeting_type', mt.meeting_type,
            'chunk_type', 'transcript',
            'attendees', mt.attendees
        ) as chunk_metadata
    FROM meeting_transcripts mt,
    LATERAL (
        SELECT SNOWFLAKE.CORTEX.SPLIT_TEXT_RECURSIVE_CHARACTER(
            mt.transcript_text, 
            500
        ) as chunks_array
    ) split_result,
    LATERAL FLATTEN(input => chunks_array) chunk
    WHERE mt.meeting_id = p_meeting_id;
    
    -- Add action items as dedicated chunks
    INSERT INTO meeting_chunks (chunk_id, meeting_id, chunk_index, chunk_text, chunk_size, chunk_metadata)
    SELECT 
        p_meeting_id || '_action_' || action.index as chunk_id,
        p_meeting_id as meeting_id,
        1000 + action.index as chunk_index,
        'ACTION ITEM: ' || action.value::STRING as chunk_text,
        LENGTH(action.value::STRING) as chunk_size,
        OBJECT_CONSTRUCT(
            'meeting_title', mt.meeting_title,
            'meeting_date', mt.meeting_date,
            'chunk_type', 'action_item',
            'attendees', mt.attendees
        ) as chunk_metadata
    FROM meeting_transcripts mt,
    LATERAL FLATTEN(input => mt.action_items) action
    WHERE mt.meeting_id = p_meeting_id
    AND ARRAY_SIZE(mt.action_items) > 0;
    
    -- Add decisions as dedicated chunks
    INSERT INTO meeting_chunks (chunk_id, meeting_id, chunk_index, chunk_text, chunk_size, chunk_metadata)
    SELECT 
        p_meeting_id || '_decision_' || decision.index as chunk_id,
        p_meeting_id as meeting_id,
        2000 + decision.index as chunk_index,
        'DECISION MADE: ' || decision.value::STRING as chunk_text,
        LENGTH(decision.value::STRING) as chunk_size,
        OBJECT_CONSTRUCT(
            'meeting_title', mt.meeting_title,
            'meeting_date', mt.meeting_date,
            'chunk_type', 'decision',
            'attendees', mt.attendees
        ) as chunk_metadata
    FROM meeting_transcripts mt,
    LATERAL FLATTEN(input => mt.decisions_made) decision
    WHERE mt.meeting_id = p_meeting_id
    AND ARRAY_SIZE(mt.decisions_made) > 0;
    
    RETURN 'Successfully chunked meeting ' || p_meeting_id;
END;
$$;

-- Test the enhanced chunking
CALL chunk_meeting_intelligently('MTG-2024-001');

-- View results by chunk type
SELECT 
    chunk_id,
    chunk_metadata:chunk_type::STRING as type,
    chunk_size,
    LEFT(chunk_text, 100) || '...' as preview
FROM meeting_chunks
WHERE meeting_id = 'MTG-2024-001'
ORDER BY chunk_index;

-- ========================================
-- QUALITY VALIDATION
-- ========================================

-- Check chunk quality distribution
WITH chunk_stats AS (
    SELECT 
        meeting_id,
        chunk_metadata:meeting_title::STRING as meeting_title,
        COUNT(*) as total_chunks,
        AVG(chunk_size) as avg_size,
        STDDEV(chunk_size) as size_variation,
        COUNT(CASE WHEN chunk_size < 200 THEN 1 END) as too_small_count,
        COUNT(CASE WHEN chunk_size > 1000 THEN 1 END) as too_large_count
    FROM meeting_chunks
    GROUP BY meeting_id, meeting_title
)
SELECT 
    meeting_title,
    total_chunks,
    ROUND(avg_size, 0) as avg_chunk_size,
    ROUND(size_variation, 0) as size_std_dev,
    too_small_count,
    too_large_count,
    CASE 
        WHEN too_small_count > total_chunks * 0.2 THEN '⚠️ Too many small chunks'
        WHEN too_large_count > total_chunks * 0.2 THEN '⚠️ Too many large chunks'
        ELSE '✅ Good distribution'
    END as quality_status
FROM chunk_stats
ORDER BY meeting_id;

For production systems, we might want to split longer meetings into smaller chunks—maybe 3-5 minute segments or topic-based splits. But for this example, keeping each meeting as one chunk works well since our sample meetings are relatively short.

Step 4: Generate Vector Embeddings

This is where the magic starts. We’re converting text into mathematical vectors that capture semantic meaning.

-- Generate embeddings using Snowflake Cortex
UPDATE meeting_chunks
SET chunk_embedding = SNOWFLAKE.CORTEX.EMBED_TEXT_1024(
    'snowflake-arctic-embed-l',
    chunk_text
);

-- Verify embeddings were created
SELECT 
    chunk_id,
    chunk_metadata:meeting_title::STRING as meeting,
    chunk_embedding IS NOT NULL as has_embedding,
    -- Self-similarity should be 0 (or very close)
    VECTOR_L2_DISTANCE(chunk_embedding, chunk_embedding) as self_similarity
FROM meeting_chunks
LIMIT 5;

The snowflake-arctic-embed-l model creates 1024-dimensional vectors. These vectors position similar concepts close together in mathematical space—so “API security” and “authentication” end up near each other, even without sharing exact words.

Step 5: Create Cortex Search Service

Now we set up the search infrastructure. This is what makes semantic search possible.

-- Create Cortex Search Service for meeting notes
CREATE OR REPLACE CORTEX SEARCH SERVICE meeting_search_service
ON chunk_text
WAREHOUSE = meeting_rag_wh
TARGET_LAG = '1 minute'
AS (
    SELECT
        chunk_id,
        chunk_text,
        chunk_metadata,
        chunk_embedding
    FROM meeting_chunks
);

-- Check service status (takes a minute to initialize)
SHOW CORTEX SEARCH SERVICES;

-- Test basic search
SELECT *
FROM TABLE(
    meeting_search_service!SEARCH(
        QUERY => 'database migration',
        LIMIT => 3
    )
);

The TARGET_LAG of 1 minute means new meetings get indexed within a minute. For most use cases, this is plenty fast.

Step 6: Building the Query Interface

Now for the exciting part—creating a function that actually answers questions intelligently.

-- Create the main RAG function
CREATE OR REPLACE FUNCTION ask_meeting_assistant(question STRING)
RETURNS VARIANT
LANGUAGE SQL
AS
$$
    WITH relevant_meetings AS (
        -- Step 1: Find most relevant meeting chunks
        SELECT 
            chunk_text,
            chunk_metadata:meeting_title::STRING as meeting_title,
            chunk_metadata:meeting_date::TIMESTAMP_LTZ as meeting_date,
            chunk_metadata:attendees as attendees
        FROM TABLE(
            meeting_search_service!SEARCH(
                QUERY => question,
                LIMIT => 3
            )
        )
    ),
    context_builder AS (
        -- Step 2: Build context from retrieved meetings
        SELECT 
            LISTAGG(
                'Meeting: ' || meeting_title || 
                '\nDate: ' || TO_VARCHAR(meeting_date, 'YYYY-MM-DD') ||
                '\nAttendees: ' || ARRAY_TO_STRING(attendees, ', ') ||
                '\nContent: ' || chunk_text,
                '\n\n---\n\n'
            ) as combined_context,
            ARRAY_AGG(
                OBJECT_CONSTRUCT(
                    'title', meeting_title,
                    'date', meeting_date,
                    'attendees', attendees
                )
            ) as sources
        FROM relevant_meetings
    )
    -- Step 3: Generate intelligent answer
    SELECT OBJECT_CONSTRUCT(
        'answer', SNOWFLAKE.CORTEX.COMPLETE(
            'llama3.1-70b',
            'You are a helpful meeting assistant. Answer the user\'s question based ONLY on the provided meeting notes. 

If the information is not in the meeting notes, say so clearly. Always mention which meeting(s) you are referencing and include relevant dates.

If there are action items or decisions related to the question, highlight them.

Meeting Notes:
' || combined_context || '

User Question: ' || question || '

Provide a clear, accurate answer with specific references to meetings and dates:'
        ),
        'sources', sources,
        'timestamp', CURRENT_TIMESTAMP()
    ) as result
    FROM context_builder
$$;

-- Test it out!
SELECT ask_meeting_assistant('Why did we choose PostgreSQL?');

-- Parse the result nicely
SELECT 
    result:answer::STRING as answer,
    result:sources as source_meetings,
    result:timestamp::TIMESTAMP_LTZ as answered_at
FROM (
    SELECT ask_meeting_assistant('Why did we choose PostgreSQL?') as result
);

Let’s break down what this function does:

Searches for the 3 most relevant meeting chunks semantically
Builds context by combining those meetings with metadata
Generates answer using an LLM that has been given specific instructions
Returns structured output with answer + sources

The key is that the LLM only uses information from our meetings—it doesn’t make things up or use its general knowledge.

Step 7: Real-World Testing

Let’s ask questions that we’d actually ask in real life:

-- Question 1: Specific decision
SELECT 
    result:answer::STRING as answer
FROM (
    SELECT ask_meeting_assistant('Why did we choose PostgreSQL over MySQL?') as result
);

-- Question 2: Action items
SELECT 
    result:answer::STRING as answer
FROM (
    SELECT ask_meeting_assistant('What are Mike\'s pending action items?') as result
);

-- Question 3: Timeline question
SELECT 
    result:answer::STRING as answer
FROM (
    SELECT ask_meeting_assistant('When is the dashboard launch scheduled?') as result
);

-- Question 4: Security discussion
SELECT 
    result:answer::STRING as answer
FROM (
    SELECT ask_meeting_assistant('What security concerns were raised about our API?') as result
);

-- Question 5: Testing boundaries
SELECT 
    result:answer::STRING as answer
FROM (
    SELECT ask_meeting_assistant('What is the weather forecast for tomorrow?') as result
);
-- This should correctly say "not in meeting notes"

What’s impressive is how the system can connect information across multiple meetings. If you ask “What’s blocking the product launch?”, it can pull from the security meeting AND the planning meeting to give a complete answer.

Step 8: Adding Specialized Query Functions

Different people need different things from meeting notes. Let’s create targeted functions:

-- Function 1: Find action items for a person
CREATE OR REPLACE FUNCTION get_action_items_for(person_name STRING)
RETURNS TABLE (
    meeting_title STRING,
    meeting_date TIMESTAMP_LTZ,
    action_item STRING
)
AS
$$
    SELECT 
        meeting_title,
        meeting_date,
        action.value::STRING as action_item
    FROM meeting_transcripts,
    LATERAL FLATTEN(input => action_items) action
    WHERE action.value::STRING ILIKE '%' || person_name || '%'
    ORDER BY meeting_date DESC
$$;

-- Usage
SELECT * FROM TABLE(get_action_items_for('Mike'));

-- Function 2: Find all decisions in a date range
CREATE OR REPLACE FUNCTION get_decisions_between(
    start_date TIMESTAMP_LTZ,
    end_date TIMESTAMP_LTZ
)
RETURNS TABLE (
    meeting_title STRING,
    meeting_date TIMESTAMP_LTZ,
    decision STRING
)
AS
$$
    SELECT 
        meeting_title,
        meeting_date,
        decision.value::STRING as decision
    FROM meeting_transcripts,
    LATERAL FLATTEN(input => decisions_made) decision
    WHERE meeting_date BETWEEN start_date AND end_date
    ORDER BY meeting_date DESC
$$;

-- Usage
SELECT * FROM TABLE(get_decisions_between(
    '2024-01-01'::TIMESTAMP_LTZ,
    '2024-02-28'::TIMESTAMP_LTZ
));

-- Function 3: Search meetings by topic
CREATE OR REPLACE FUNCTION find_meetings_about(topic STRING)
RETURNS TABLE (
    meeting_id STRING,
    meeting_title STRING,
    meeting_date TIMESTAMP_LTZ,
    relevance_score FLOAT
)
AS
$$
    SELECT 
        chunk_id as meeting_id,
        chunk_metadata:meeting_title::STRING as meeting_title,
        chunk_metadata:meeting_date::TIMESTAMP_LTZ as meeting_date,
        1.0 as relevance_score  -- Cortex Search returns results sorted by relevance
    FROM TABLE(
        meeting_search_service!SEARCH(
            QUERY => topic,
            LIMIT => 10
        )
    )
$$;

-- Usage
SELECT * FROM TABLE(find_meetings_about('API security'));

These specialized functions give your team multiple ways to interact with meeting data—some people want natural language, others want structured queries.

Step 9: Building a Conversation History Table

To make this truly useful, we should track what people ask and whether answers were helpful:

-- Track queries and feedback
CREATE OR REPLACE TABLE meeting_assistant_logs (
    log_id INTEGER AUTOINCREMENT,
    user_name STRING,
    question STRING,
    answer VARIANT,
    sources_used ARRAY,
    helpful_vote INTEGER, -- 1 = helpful, -1 = not helpful, NULL = no vote yet
    feedback_comment STRING,
    queried_at TIMESTAMP_LTZ DEFAULT CURRENT_TIMESTAMP()
);

-- Enhanced function that logs queries
CREATE OR REPLACE PROCEDURE ask_and_log(
    user_name STRING,
    question STRING
)
RETURNS VARIANT
LANGUAGE SQL
AS
$$
DECLARE
    result VARIANT;
BEGIN
    -- Get answer
    result := ask_meeting_assistant(:question);

    -- Log the interaction
    INSERT INTO meeting_assistant_logs (user_name, question, answer, sources_used)
    SELECT 
        :user_name,
        :question,
        :result,
        :result:sources
    ;

    RETURN result;
END;
$$;

-- Usage
CALL ask_and_log('[email protected]', 'What did we decide about caching?');

-- View query history
SELECT 
    log_id,
    user_name,
    question,
    answer:answer::STRING as answer_preview,
    helpful_vote,
    queried_at
FROM meeting_assistant_logs
ORDER BY queried_at DESC
LIMIT 10;

-- Add feedback
UPDATE meeting_assistant_logs
SET helpful_vote = 1, 
    feedback_comment = 'Perfect! Found exactly what I needed.'
WHERE log_id = 1;

This logging is crucial for two reasons:

Accountability: Know who’s using the system and what they’re asking
Improvement: See which queries return unhelpful answers and refine

Step 10: Analytics Dashboard Queries

Let’s create queries that help us understand usage patterns:

-- Most common topics people ask about
SELECT 
    -- Extract key nouns/topics from questions
    LOWER(REGEXP_SUBSTR(question, '\\b(database|API|security|migration|dashboard|launch|PostgreSQL|Redis|cache|decision|action)\\b', 1, 1, 'i')) as topic,
    COUNT(*) as question_count
FROM meeting_assistant_logs
WHERE topic IS NOT NULL
GROUP BY topic
ORDER BY question_count DESC
LIMIT 10;

-- User engagement metrics
SELECT 
    user_name,
    COUNT(*) as total_queries,
    AVG(CASE WHEN helpful_vote = 1 THEN 1.0 ELSE 0.0 END) as satisfaction_rate,
    COUNT(CASE WHEN helpful_vote = -1 THEN 1 END) as unhelpful_answers
FROM meeting_assistant_logs
GROUP BY user_name
ORDER BY total_queries DESC;

-- Meetings that get referenced most
SELECT 
    source.value:title::STRING as meeting_title,
    source.value:date::TIMESTAMP_LTZ as meeting_date,
    COUNT(*) as times_referenced
FROM meeting_assistant_logs,
LATERAL FLATTEN(input => sources_used) source
GROUP BY meeting_title, meeting_date
ORDER BY times_referenced DESC
LIMIT 10;

-- Questions that received negative feedback
SELECT 
    question,
    answer:answer::STRING as answer,
    feedback_comment,
    queried_at
FROM meeting_assistant_logs
WHERE helpful_vote = -1
ORDER BY queried_at DESC;

These analytics tell us what’s working and what needs improvement. If certain types of questions consistently get negative feedback, we know where to focus our refinement efforts.

Step 11: Automating Meeting Ingestion

In production, we’d want new meetings to automatically flow into the system. Here’s how that might look:

-- Create stage for incoming meeting transcripts
CREATE OR REPLACE STAGE meeting_uploads
    FILE_FORMAT = (TYPE = 'JSON');

-- Create stream to detect new meetings
CREATE OR REPLACE STREAM new_meetings_stream
ON TABLE meeting_transcripts;

-- Task to process new meetings
CREATE OR REPLACE TASK process_new_meetings
    WAREHOUSE = meeting_rag_wh
    SCHEDULE = '5 MINUTE'
    WHEN SYSTEM$STREAM_HAS_DATA('new_meetings_stream')
AS
BEGIN
    -- Insert chunks for new meetings
    INSERT INTO meeting_chunks (chunk_id, meeting_id, chunk_index, chunk_text, chunk_metadata)
    SELECT 
        meeting_id || '_' || chunk_index as chunk_id,
        meeting_id,
        chunk_index,
        chunk_text,
        chunk_metadata
    FROM (
        SELECT 
            meeting_id,
            transcript_text as chunk_text,
            1 as chunk_index,
            OBJECT_CONSTRUCT(
                'meeting_title', meeting_title,
                'meeting_date', meeting_date,
                'meeting_type', meeting_type,
                'attendees', attendees,
                'has_action_items', ARRAY_SIZE(action_items) > 0,
                'has_decisions', ARRAY_SIZE(decisions_made) > 0
            ) as chunk_metadata
        FROM new_meetings_stream
        WHERE METADATA$ACTION = 'INSERT'
    );

    -- Generate embeddings for new chunks
    UPDATE meeting_chunks
    SET chunk_embedding = SNOWFLAKE.CORTEX.EMBED_TEXT_1024(
        'snowflake-arctic-embed-l',
        chunk_text
    )
    WHERE chunk_embedding IS NULL;
END;

-- Resume task
ALTER TASK process_new_meetings RESUME;

Now whenever a new meeting gets added to the meeting_transcripts table, it’s automatically chunked, embedded, and searchable within 5 minutes. No manual intervention needed.

Advanced Feature: Topic Extraction

We can use Cortex to automatically extract topics from meetings:

-- Add topics column
ALTER TABLE meeting_transcripts 
ADD COLUMN ai_extracted_topics ARRAY;

-- Extract topics using LLM
UPDATE meeting_transcripts
SET ai_extracted_topics = PARSE_JSON(
    SNOWFLAKE.CORTEX.COMPLETE(
        'llama3.1-70b',
        'Extract 3-5 main topics from this meeting transcript. Return ONLY a JSON array of strings, nothing else.

Transcript: ' || transcript_text || '

Format: ["topic1", "topic2", "topic3"]'
    )
);

-- View extracted topics
SELECT 
    meeting_title,
    meeting_date,
    ai_extracted_topics
FROM meeting_transcripts
ORDER BY meeting_date;

-- Find meetings by AI-extracted topic
SELECT 
    meeting_title,
    meeting_date,
    topic.value::STRING as topic
FROM meeting_transcripts,
LATERAL FLATTEN(input => ai_extracted_topics) topic
WHERE topic.value::STRING ILIKE '%security%'
ORDER BY meeting_date DESC;

This auto-tagging makes meetings even more discoverable without manual categorization.

Cost Optimization Tips

Running LLMs on every query can get expensive. Here are strategies to keep costs reasonable:

-- Strategy 1: Response caching
CREATE OR REPLACE TABLE answer_cache (
    question_hash STRING PRIMARY KEY,
    question STRING,
    cached_answer VARIANT,
    cached_at TIMESTAMP_LTZ,
    cache_hits INTEGER DEFAULT 0
);

-- Modified function with caching
CREATE OR REPLACE FUNCTION ask_with_cache(question STRING)
RETURNS VARIANT
LANGUAGE SQL
AS
$$
    SELECT 
        COALESCE(
            -- Try cache first
            (SELECT cached_answer 
             FROM answer_cache 
             WHERE question_hash = SHA2(LOWER(TRIM(question)))
             AND cached_at >= DATEADD(hour, -24, CURRENT_TIMESTAMP())
             LIMIT 1),
            -- Generate new answer if not cached
            ask_meeting_assistant(question)
        )
$$;

-- Strategy 2: Smaller model for simple queries
CREATE OR REPLACE FUNCTION ask_simple(question STRING)
RETURNS VARIANT
LANGUAGE SQL
AS
$$
    -- Use smaller, cheaper model for straightforward questions
    -- Use llama3.1-8b instead of llama3.1-70b for basic lookups
    WITH relevant_meetings AS (
        SELECT 
            chunk_text,
            chunk_metadata:meeting_title::STRING as meeting_title,
            chunk_metadata:meeting_date::TIMESTAMP_LTZ as meeting_date
        FROM TABLE(
            meeting_search_service!SEARCH(
                QUERY => question,
                LIMIT => 2  -- Fewer chunks = lower cost
            )
        )
    ),
    context_builder AS (
        SELECT 
            LISTAGG(
                'Meeting: ' || meeting_title || '\n' || chunk_text,
                '\n\n---\n\n'
            ) as combined_context
        FROM relevant_meetings
    )
    SELECT OBJECT_CONSTRUCT(
        'answer', SNOWFLAKE.CORTEX.COMPLETE(
            'llama3.1-8b',  -- Cheaper model
            'Answer briefly based on these meeting notes: ' || combined_context || 
            '\n\nQuestion: ' || question
        )
    )
    FROM context_builder
$$;

-- Strategy 3: Batch processing for reports
-- Instead of asking individual questions, batch them
CREATE OR REPLACE PROCEDURE generate_weekly_summary()
RETURNS VARIANT
LANGUAGE SQL
AS
$$
DECLARE
    result VARIANT;
BEGIN
    -- Get all meetings from last week
    WITH last_week_meetings AS (
        SELECT 
            LISTAGG(
                'Meeting: ' || meeting_title || 
                '\nDate: ' || meeting_date ||
                '\nKey points: ' || transcript_text,
                '\n\n---\n\n'
            ) as all_meetings
        FROM meeting_transcripts
        WHERE meeting_date >= DATEADD(week, -1, CURRENT_TIMESTAMP())
    )
    SELECT OBJECT_CONSTRUCT(
        'summary', SNOWFLAKE.CORTEX.SUMMARIZE(all_meetings),
        'key_decisions', SNOWFLAKE.CORTEX.COMPLETE(
            'llama3.1-70b',
            'List all key decisions made in these meetings: ' || all_meetings
        ),
        'action_items', SNOWFLAKE.CORTEX.COMPLETE(
            'llama3.1-70b',
            'List all action items from these meetings: ' || all_meetings
        )
    ) INTO result
    FROM last_week_meetings;

    RETURN result;
END;
$$;

These optimization strategies can cut costs by 40-60% while maintaining quality for most queries.

Quality Assurance: Building a Test Suite

We should validate that our RAG system returns accurate answers:

-- Create test cases table
CREATE OR REPLACE TABLE rag_test_cases (
    test_id INTEGER AUTOINCREMENT,
    test_question STRING,
    expected_answer_contains STRING,
    expected_meeting_reference STRING,
    test_category STRING,
    created_at TIMESTAMP_LTZ DEFAULT CURRENT_TIMESTAMP()
);

-- Add test cases based on our sample data
INSERT INTO rag_test_cases (test_question, expected_answer_contains, expected_meeting_reference, test_category)
VALUES
('Why did we choose PostgreSQL?', 'JSON support', 'Database Migration Discussion', 'decision_lookup'),
('What are Mike\'s action items?', 'rate limiting', 'API Security Review', 'action_item_lookup'),
('When is the dashboard launch?', 'February 28', 'Q1 Product Planning', 'timeline_lookup'),
('What security issues were found?', 'rate limiting', 'API Security Review', 'problem_identification'),
('Who attended the planning meeting?', 'Sarah Chen', 'Q1 Product Planning', 'attendee_lookup');

-- Run test suite
CREATE OR REPLACE PROCEDURE run_rag_tests()
RETURNS TABLE (
    test_id INTEGER,
    question STRING,
    passed BOOLEAN,
    answer STRING,
    reason STRING
)
LANGUAGE SQL
AS
$$
DECLARE
    result_cursor CURSOR FOR
        WITH test_results AS (
            SELECT 
                t.test_id,
                t.test_question,
                ask_meeting_assistant(t.test_question) as answer_obj,
                t.expected_answer_contains,
                t.expected_meeting_reference
            FROM rag_test_cases t
        )
        SELECT 
            test_id,
            test_question as question,
            (
                answer_obj:answer::STRING ILIKE '%' || expected_answer_contains || '%'
                AND ARRAY_TO_STRING(answer_obj:sources, ',') ILIKE '%' || expected_meeting_reference || '%'
            ) as passed,
            answer_obj:answer::STRING as answer,
            CASE 
                WHEN answer_obj:answer::STRING ILIKE '%' || expected_answer_contains || '%' THEN 'Answer contains expected content'
                ELSE 'Missing expected content: ' || expected_answer_contains
            END as reason
        FROM test_results;
BEGIN
    OPEN result_cursor;
    RETURN TABLE(result_cursor);
END;
$$;

-- Run tests
CALL run_rag_tests();

This automated testing ensures our RAG system maintains quality as we add more meetings and refine prompts.

Monitoring and Alerts

Set up monitoring to catch issues early:

-- Create monitoring table
CREATE OR REPLACE TABLE rag_health_metrics (
    metric_date DATE,
    total_queries INTEGER,
    avg_response_time_seconds FLOAT,
    successful_queries INTEGER,
    failed_queries INTEGER,
    avg_satisfaction_score FLOAT,
    unique_users INTEGER
);

-- Daily health check task
CREATE OR REPLACE TASK daily_health_check
    WAREHOUSE = meeting_rag_wh
    SCHEDULE = 'USING CRON 0 8 * * * America/Los_Angeles'  -- 8 AM daily
AS
INSERT INTO rag_health_metrics
SELECT 
    CURRENT_DATE() as metric_date,
    COUNT(*) as total_queries,
    AVG(DATEDIFF(second, queried_at, CURRENT_TIMESTAMP())) as avg_response_time_seconds,
    COUNT(CASE WHEN answer IS NOT NULL THEN 1 END) as successful_queries,
    COUNT(CASE WHEN answer IS NULL THEN 1 END) as failed_queries,
    AVG(CASE WHEN helpful_vote = 1 THEN 1.0 
             WHEN helpful_vote = -1 THEN 0.0 
             ELSE NULL END) as avg_satisfaction_score,
    COUNT(DISTINCT user_name) as unique_users
FROM meeting_assistant_logs
WHERE queried_at >= DATEADD(day, -1, CURRENT_DATE());

-- Resume health check task
ALTER TASK daily_health_check RESUME;

-- Alert query (run this manually or set up notifications)
SELECT 
    metric_date,
    total_queries,
    avg_satisfaction_score,
    CASE 
        WHEN avg_satisfaction_score < 0.7 THEN '⚠️ LOW SATISFACTION'
        WHEN failed_queries > total_queries * 0.1 THEN '⚠️ HIGH FAILURE RATE'
        ELSE '✅ HEALTHY'
    END as status
FROM rag_health_metrics
WHERE metric_date >= DATEADD(day, -7, CURRENT_DATE())
ORDER BY metric_date DESC;

Advanced: Multi-Turn Conversations

Right now, each question is independent. We can add conversation context:

-- Table to track conversation sessions
CREATE OR REPLACE TABLE conversation_sessions (
    session_id STRING PRIMARY KEY,
    user_name STRING,
    started_at TIMESTAMP_LTZ DEFAULT CURRENT_TIMESTAMP(),
    last_interaction TIMESTAMP_LTZ DEFAULT CURRENT_TIMESTAMP(),
    conversation_history ARRAY
);

-- Function with conversation memory
CREATE OR REPLACE FUNCTION ask_with_context(
    session_id STRING,
    user_name STRING,
    current_question STRING
)
RETURNS VARIANT
LANGUAGE SQL
AS
$$
    WITH session_data AS (
        SELECT 
            COALESCE(conversation_history, ARRAY_CONSTRUCT()) as history
        FROM conversation_sessions
        WHERE session_id = session_id
        LIMIT 1
    ),
    enhanced_question AS (
        SELECT 
            CASE 
                WHEN ARRAY_SIZE(history) > 0 THEN
                    current_question || ' (Context from previous questions: ' || 
                    ARRAY_TO_STRING(history, ', ') || ')'
                ELSE current_question
            END as full_question
        FROM session_data
    )
    SELECT ask_meeting_assistant(full_question)
    FROM enhanced_question
$$;

This allows follow-up questions like:

User: “What did we decide about the database?”
System: “We chose PostgreSQL…”
User: “Why that over MySQL?” ← understands “that” refers to PostgreSQL

Example Use Cases in Action

Let’s see how different teams would use this:

Engineering Team:

-- Find all technical decisions
SELECT 
    result:answer::STRING as answer
FROM (
    SELECT ask_meeting_assistant(
        'What technical decisions were made in the last month?'
    ) as result
);

-- Get status of specific feature
SELECT 
    result:answer::STRING as answer
FROM (
    SELECT ask_meeting_assistant(
        'What is the current status of the Redis implementation?'
    ) as result
);

Product Team:

-- Customer feedback summary
SELECT 
    result:answer::STRING as answer
FROM (
    SELECT ask_meeting_assistant(
        'What customer feedback have we received about the dashboard?'
    ) as result
);

-- Feature prioritization
SELECT 
    result:answer::STRING as answer
FROM (
    SELECT ask_meeting_assistant(
        'What features are prioritized for Q2?'
    ) as result
);

Management:

-- Project blockers
SELECT 
    result:answer::STRING as answer
FROM (
    SELECT ask_meeting_assistant(
        'What are the current blockers for the product launch?'
    ) as result
);

-- Team action items
SELECT * FROM TABLE(get_action_items_for('engineering team'));

Extending to Other Data Sources

The beautiful thing about this architecture is it’s extensible. We can add other knowledge sources:

-- Add Jira tickets
CREATE TABLE jira_issues (
    issue_key STRING,
    summary STRING,
    description STRING,
    status STRING,
    created_date TIMESTAMP_LTZ
);

-- Add Slack discussions
CREATE TABLE slack_threads (
    thread_id STRING,
    channel_name STRING,
    message_text STRING,
    posted_at TIMESTAMP_LTZ
);

-- Add Confluence docs
CREATE TABLE wiki_pages (
    page_id STRING,
    title STRING,
    content STRING,
    last_updated TIMESTAMP_LTZ
);

-- Unified search across all sources
CREATE OR REPLACE FUNCTION ask_everything(question STRING)
RETURNS VARIANT
LANGUAGE SQL
AS
$$
    -- Implementation would search across all sources
    -- and combine results intelligently
$$;

Now your RAG system becomes a true organizational knowledge hub.

What We’ve Built

Let’s recap what this system can do:

✅ Natural language search across all meeting notes
✅ Intelligent answers with source citations
✅ Action item tracking by person and date
✅ Decision history with full context
✅ Topic extraction and categorization
✅ Multi-turn conversations with context memory
✅ Automated ingestion of new meetings
✅ Quality monitoring and testing
✅ Cost optimization through caching
✅ Integration ready for Slack, Teams, etc.

And all of this runs entirely within Snowflake—no external services, no data movement, no infrastructure headaches.

The Real Value Proposition

Here’s what changes when we have a system like this:

Before:

Someone asks “What did we decide about X?”
You spend 20 minutes searching through meeting notes
You find partial information across 3 different meetings
You piece together an answer, but you’re not 100% sure
Total time wasted: 20+ minutes

After:

Someone asks “What did we decide about X?”
You type the question into the assistant
Get a complete answer with sources in 2 seconds
Click through to verify if needed
Total time: 30 seconds

That’s a 40x improvement. Multiply that across your entire team, every day, and the ROI becomes obvious.

Future Enhancements We Can Build

This is just the foundation. Here are ideas for taking it further:

Automatic summaries: Email digest every Monday with key decisions from last week
Proactive alerts: “You have 3 action items due this week”
Meeting preparation: “Here’s what was discussed last time you met with this team”
Trend analysis: “Database migration has been mentioned 15 times this month”
Sentiment tracking: Detect when team morale shifts in meetings
Smart reminders: “You committed to X in the meeting but haven’t updated status”

The possibilities are endless once we have meeting data structured and searchable.

Final Thoughts

The technology for this has existed for years, but what’s changed is how accessible it’s become. Building a RAG system used to require:

Deep ML expertise
Complex infrastructure
Weeks of development time
Ongoing maintenance burden

Now, with Snowflake Cortex, we can build it in a weekend using SQL. That’s the real revolution—not the technology itself, but the democratization of it.

Every organization has the same problem: valuable knowledge trapped in meeting notes that nobody ever looks at again. We’ve just seen how to solve that problem in a practical, maintainable way.

The question isn’t “Can we build this?” anymore. It’s “When do we start?”

Complete Setup Script : Github Repo

Here’s everything in one place to get started:

-- Complete setup script for Meeting Notes RAG
-- Run this entire script to set up the system

-- 1. Database setup
CREATE DATABASE IF NOT EXISTS meeting_intelligence;
USE DATABASE meeting_intelligence;
CREATE SCHEMA IF NOT EXISTS meetings;
USE SCHEMA meetings;

-- 2. Compute
CREATE WAREHOUSE IF NOT EXISTS meeting_rag_wh
WITH WAREHOUSE_SIZE = 'SMALL'
     AUTO_SUSPEND = 60
     AUTO_RESUME = TRUE;
USE WAREHOUSE meeting_rag_wh;

-- 3. Main tables
CREATE OR REPLACE TABLE meeting_transcripts (
    meeting_id STRING PRIMARY KEY,
    meeting_title STRING NOT NULL,
    meeting_date TIMESTAMP_LTZ NOT NULL,
    meeting_type STRING,
    attendees ARRAY,
    transcript_text STRING NOT NULL,
    action_items ARRAY,
    decisions_made ARRAY,
    topics_discussed ARRAY
);

CREATE OR REPLACE TABLE meeting_chunks (
    chunk_id STRING PRIMARY KEY,
    meeting_id STRING,
    chunk_text STRING,
    chunk_metadata VARIANT,
    chunk_embedding VECTOR(FLOAT, 1024)
);

-- 4. Load sample data (use the INSERT statements from earlier)

-- 5. Create embeddings and search service
UPDATE meeting_chunks
SET chunk_embedding = SNOWFLAKE.CORTEX.EMBED_TEXT_1024(
    'snowflake-arctic-embed-l',
    chunk_text
);

CREATE OR REPLACE CORTEX SEARCH SERVICE meeting_search_service
ON chunk_text
WAREHOUSE = meeting_rag_wh
TARGET_LAG = '1 minute'
AS (
    SELECT chunk_id, chunk_text, chunk_metadata, chunk_embedding
    FROM meeting_chunks
);

-- 6. Create query function (use function from earlier)

-- 7. Test it!
SELECT ask_meeting_assistant('What are the current project blockers?');

Additional Resources

Documentation Links:

Want to Learn More?
This is just scratching the surface of what’s possible with RAG systems in Snowflake. The same principles apply to:

Customer support ticket analysis
Documentation search
Code repository search
Email analysis
Contract review
Research paper analysis

The foundation we’ve built here can be adapted to any text-based knowledge base.