The Problem We All Face (And Nobody Talks About)
You know that feeling when someone asks âWhat did we decide about the API redesign?â and youâre frantically scrolling through three weeks of meeting notes trying to find that one conversation?
Or when your manager asks âWhat action items were assigned to the engineering team last month?â and you realize those decisions are buried across 47 different meeting transcripts scattered in Google Docs, Notion, and email threads?
Yeah, weâve all been there.
Hereâs the thing: companies have meetings. Lots of them. And every single meeting contains valuable informationâdecisions made, problems discussed, action items assigned, ideas shared. But all that knowledge just⊠disappears into the void of meeting notes that nobody reads again.
Until now.
What if we could build a system where you just ask âWhat did we decide about the database migration?â and get an instant answer with exact sources? Not some generic corporate search that returns 500 irrelevant documents, but an actual intelligent assistant that understands context.
Thatâs exactly what we can build with Snowflake Cortex. And honestly? Itâs simpler than you think.
What Weâre Building (In Plain English)
Before diving into code, letâs understand what a Meeting Notes RAG actually does:
RAG = Retrieval Augmented Generation
Think of it like this:
- Store all your meeting notes in Snowflake
- Convert them into a format that AI can search semantically (not just keyword matching)
- When someone asks a question, find the most relevant meeting notes
- Use an AI model to generate an intelligent answer based on those notes
- Show sources so people can verify information
Real example:
- Question: âWhy did we choose PostgreSQL over MySQL?â
- System finds: Engineering meeting from March 15th discussing database options
- Answer: âAccording to the March 15th engineering meeting, the team chose PostgreSQL primarily because of better JSON support and more robust handling of concurrent writes. Sarah from backend also mentioned PostgreSQLâs superior full-text search capabilities.â
See? Not just âhere are 50 documents mentioning PostgreSQLâ but an actual synthesized answer with context.
Why Snowflake for This?
You might be thinking: âCanât I just use ChatGPT with my notes?â
Sure, but hereâs why Snowflake is better for this:
- Security: Meeting notes contain sensitive information. With Snowflake, data never leaves your secure environment
- Scale: Got 10,000 meetings? No problem. Snowflake handles it easily
- Integration: Your meeting data might already be in Snowflake, or easily pipeable
- Cortex Search: Built-in vector searchâno need for external vector databases
- SQL interface: Everyone on your team can query it without learning new tools
- Cost: Pay only for what you use, no separate infrastructure
Plus, thereâs something elegant about keeping everything in one place.
The Architecture (Keep It Simple)
Hereâs how we can structure this:
Meeting Notes (Zoom, Teams, Google Meet transcripts)
â
Load into Snowflake table
â
Split into chunks
â
Generate vector embeddings
â
Create Cortex Search Service
â
User asks question â Find relevant chunks â Generate answer with sources
No microservices, no Kubernetes, no headaches. Just Snowflake doing what it does best.
Step 1: Setting Up the Foundation
First, letâs create a proper database structure. Weâre organizing this so itâs maintainable and scalable.
-- Create dedicated database for meeting intelligence
CREATE DATABASE IF NOT EXISTS meeting_intelligence;
USE DATABASE meeting_intelligence;
-- Create schema for organization
CREATE SCHEMA IF NOT EXISTS meetings;
USE SCHEMA meetings;
-- Set up compute
CREATE WAREHOUSE IF NOT EXISTS meeting_rag_wh
WITH
WAREHOUSE_SIZE = 'SMALL'
AUTO_SUSPEND = 60
AUTO_RESUME = TRUE
INITIALLY_SUSPENDED = TRUE;
USE WAREHOUSE meeting_rag_wh;
-- Ensure we have Cortex access
-- (Account admin needs to grant this)
-- GRANT DATABASE ROLE SNOWFLAKE.CORTEX_USER TO ROLE your_role_name;
Nothing fancy hereâjust clean organization. A SMALL warehouse is perfectly fine for this; we can always scale if needed.
Step 2: Designing the Meeting Notes Table
This is where thoughtful schema design matters. We want to capture not just the content, but useful metadata.
-- Main table for meeting transcripts
CREATE OR REPLACE TABLE meeting_transcripts (
meeting_id STRING PRIMARY KEY,
meeting_title STRING NOT NULL,
meeting_date TIMESTAMP_LTZ NOT NULL,
meeting_type STRING, -- 'standup', 'planning', 'retrospective', 'one-on-one'
attendees ARRAY,
duration_minutes INTEGER,
transcript_text STRING NOT NULL,
action_items ARRAY,
decisions_made ARRAY,
topics_discussed ARRAY,
recording_url STRING,
created_at TIMESTAMP_LTZ DEFAULT CURRENT_TIMESTAMP(),
updated_at TIMESTAMP_LTZ DEFAULT CURRENT_TIMESTAMP()
);
-- Add some realistic sample data
INSERT INTO meeting_transcripts VALUES
(
'MTG-2024-001',
'Q1 Product Planning',
'2024-01-15 10:00:00',
'planning',
ARRAY_CONSTRUCT('Sarah Chen', 'Mike Rodriguez', 'Emily Watson', 'David Park'),
60,
'Sarah opened the meeting discussing Q1 priorities. The main focus is launching the new dashboard feature by end of February. Mike raised concerns about the API rate limiting affecting customer experience. After discussion, the team decided to implement a caching layer using Redis before launch. Emily suggested we should also add analytics to track dashboard load times. David mentioned the infrastructure team can provision the Redis cluster within a week. Action items: Mike to create Redis implementation plan, Emily to design analytics dashboard, David to provision infrastructure by Jan 22.',
ARRAY_CONSTRUCT(
'Mike: Create Redis implementation plan by Jan 18',
'Emily: Design analytics dashboard mockup by Jan 20',
'David: Provision Redis cluster by Jan 22'
),
ARRAY_CONSTRUCT(
'Implement Redis caching layer before dashboard launch',
'Add analytics tracking for dashboard performance',
'Q1 launch date confirmed for Feb 28'
),
ARRAY_CONSTRUCT('Dashboard launch', 'API performance', 'Redis caching', 'Analytics'),
'https://zoom.us/rec/share/mock-url-001',
CURRENT_TIMESTAMP(),
CURRENT_TIMESTAMP()
),
(
'MTG-2024-002',
'Database Migration Discussion',
'2024-01-22 14:00:00',
'technical',
ARRAY_CONSTRUCT('Tom Liu', 'Sarah Chen', 'Alex Kumar'),
45,
'Tom presented three options for the database migration: PostgreSQL, MySQL, and staying with current setup. Sarah advocated strongly for PostgreSQL citing better JSON support and more robust concurrent write handling. Alex agreed, mentioning PostgreSQL full-text search capabilities would be beneficial for the search feature we are planning. The team also discussed migration timeline - estimated 3 weeks for full migration including testing. Concern raised about downtime, but Tom confirmed we can do blue-green deployment with minimal disruption. Decision: Move forward with PostgreSQL, start migration planning next week.',
ARRAY_CONSTRUCT(
'Tom: Create detailed migration plan by Jan 29',
'Sarah: Review PostgreSQL best practices documentation',
'Alex: Set up staging PostgreSQL environment by Feb 1'
),
ARRAY_CONSTRUCT(
'Selected PostgreSQL as new database',
'Migration timeline: 3 weeks',
'Use blue-green deployment strategy'
),
ARRAY_CONSTRUCT('Database migration', 'PostgreSQL', 'Blue-green deployment'),
'https://zoom.us/rec/share/mock-url-002',
CURRENT_TIMESTAMP(),
CURRENT_TIMESTAMP()
),
(
'MTG-2024-003',
'API Security Review',
'2024-02-05 11:00:00',
'security',
ARRAY_CONSTRUCT('Jennifer Lee', 'Tom Liu', 'Mike Rodriguez'),
90,
'Jennifer led security audit of our API endpoints. Critical finding: several endpoints lack proper rate limiting, making them vulnerable to DDoS attacks. Mike explained current authentication uses JWT but token expiry is set too long at 7 days. Team agreed to reduce to 24 hours and implement refresh token mechanism. Jennifer recommended adding API key rotation policy every 90 days. Also discussed implementing request signing for sensitive endpoints. Tom mentioned we should add monitoring alerts for unusual API patterns. Decision: Implement all recommendations before March 1st launch. This is now a blocker for product launch.',
ARRAY_CONSTRUCT(
'Mike: Implement rate limiting on all API endpoints by Feb 12',
'Tom: Set up monitoring and alerts by Feb 15',
'Jennifer: Document API key rotation policy by Feb 10',
'Mike: Reduce JWT expiry to 24h and add refresh tokens by Feb 16'
),
ARRAY_CONSTRUCT(
'All API security improvements are launch blockers',
'JWT token expiry reduced from 7 days to 24 hours',
'Implement 90-day API key rotation policy',
'Add request signing for sensitive endpoints'
),
ARRAY_CONSTRUCT('API security', 'Rate limiting', 'JWT tokens', 'Authentication'),
'https://zoom.us/rec/share/mock-url-003',
CURRENT_TIMESTAMP(),
CURRENT_TIMESTAMP()
),
(
'MTG-2024-004',
'Weekly Engineering Standup',
'2024-02-12 09:00:00',
'standup',
ARRAY_CONSTRUCT('Sarah Chen', 'Mike Rodriguez', 'Emily Watson', 'David Park', 'Tom Liu'),
30,
'Quick updates from everyone. Sarah completed the dashboard analytics implementation, looking good in staging. Mike finished API rate limiting yesterday, currently doing load testing. Emily working on mobile responsive design, encountered some CSS issues with the new dashboard on tablets. David reports Redis cluster is stable, handling 10k requests per second easily. Tom mentioned database migration testing is going well, planning cutover for next weekend. No blockers reported. Brief discussion about whether we need additional caching for mobile API endpoints - agreed to monitor after launch and optimize if needed.',
ARRAY_CONSTRUCT(
'Emily: Fix tablet responsive issues by Feb 14',
'Mike: Complete load testing report by Feb 13',
'Tom: Finalize migration cutover plan by Feb 14'
),
ARRAY_CONSTRUCT(),
ARRAY_CONSTRUCT('Sprint progress', 'Dashboard status', 'Mobile responsive', 'Database migration'),
'https://zoom.us/rec/share/mock-url-004',
CURRENT_TIMESTAMP(),
CURRENT_TIMESTAMP()
),
(
'MTG-2024-005',
'Customer Feedback Review',
'2024-02-20 15:00:00',
'product',
ARRAY_CONSTRUCT('Emily Watson', 'Sarah Chen', 'Product Manager Jane Smith'),
60,
'Jane presented findings from customer interviews conducted last week. Top request: ability to export dashboard data to Excel. 15 out of 20 customers mentioned this. Also strong demand for custom date range filters, currently we only support preset ranges. Sarah noted export feature is technically straightforward, could be done in one sprint. Emily suggested we should also add PDF export option. Jane agreed, mentioned several enterprise customers specifically asked for PDF reports for presentations. Team consensus: prioritize Excel export for Q2, PDF can be Q3. Also discussed adding keyboard shortcuts for power users - Emily volunteered to research what shortcuts would be most useful.',
ARRAY_CONSTRUCT(
'Sarah: Create technical spec for Excel export by Feb 27',
'Emily: Research and propose keyboard shortcuts by Feb 29',
'Jane: Follow up with customers about custom date ranges by Mar 1'
),
ARRAY_CONSTRUCT(
'Excel export prioritized for Q2 Sprint 1',
'PDF export moved to Q3',
'Custom date range filters to be designed',
'Keyboard shortcuts under consideration'
),
ARRAY_CONSTRUCT('Customer feedback', 'Feature requests', 'Excel export', 'Dashboard improvements'),
'https://zoom.us/rec/share/mock-url-005',
CURRENT_TIMESTAMP(),
CURRENT_TIMESTAMP()
);
-- Verify data loaded
SELECT
meeting_id,
meeting_title,
meeting_date,
ARRAY_SIZE(attendees) as attendee_count,
ARRAY_SIZE(action_items) as action_count,
LEFT(transcript_text, 100) || '...' as preview
FROM meeting_transcripts
ORDER BY meeting_date;
Notice how weâre capturing structured data (action items, decisions, topics) alongside unstructured text. This dual approach gives us flexibility in how we search and analyze later.
Step 3: Chunking Strategy for Meetings
Unlike documentation, meetings have natural structureâthey flow chronologically. We can chunk intelligently based on topics or time segments.
-- Table for meeting chunks (optimized for retrieval)
CREATE OR REPLACE TABLE meeting_chunks (
chunk_id STRING PRIMARY KEY,
meeting_id STRING,
chunk_index INTEGER,
chunk_text STRING,
chunk_size INTEGER,
chunk_metadata VARIANT,
chunk_embedding VECTOR(FLOAT, 1024)
);
-- Use SPLIT_TEXT_RECURSIVE_CHARACTER for intelligent chunking
-- This function preserves sentence boundaries and context
INSERT INTO meeting_chunks (chunk_id, meeting_id, chunk_index, chunk_text, chunk_size, chunk_metadata)
SELECT
meeting_id || '_chunk_' || chunk_index as chunk_id,
meeting_id,
chunk_index,
chunk_text,
LENGTH(chunk_text) as chunk_size,
OBJECT_CONSTRUCT(
'meeting_title', meeting_title,
'meeting_date', meeting_date,
'meeting_type', meeting_type,
'attendees', attendees,
'duration_minutes', duration_minutes,
'has_action_items', ARRAY_SIZE(action_items) > 0,
'has_decisions', ARRAY_SIZE(decisions_made) > 0,
'topics', topics_discussed,
'total_chunks', total_chunks_in_meeting
) as chunk_metadata
FROM (
SELECT
mt.meeting_id,
mt.meeting_title,
mt.meeting_date,
mt.meeting_type,
mt.attendees,
mt.duration_minutes,
mt.action_items,
mt.decisions_made,
mt.topics_discussed,
chunk.value::STRING as chunk_text,
chunk.index as chunk_index,
ARRAY_SIZE(chunks_array) as total_chunks_in_meeting
FROM meeting_transcripts mt,
LATERAL (
-- SPLIT_TEXT_RECURSIVE_CHARACTER parameters:
-- text: the content to split
-- max_characters: target size per chunk (500-1000 works well)
SELECT SNOWFLAKE.CORTEX.SPLIT_TEXT_RECURSIVE_CHARACTER(
mt.transcript_text,
500 -- Target 500 characters per chunk
) as chunks_array
) split_result,
LATERAL FLATTEN(input => chunks_array) chunk
);
-- Verify what we created
SELECT
meeting_id,
chunk_metadata:meeting_title::STRING as meeting_title,
COUNT(*) as num_chunks,
AVG(chunk_size) as avg_chunk_size,
MIN(chunk_size) as min_chunk_size,
MAX(chunk_size) as max_chunk_size
FROM meeting_chunks
GROUP BY meeting_id, chunk_metadata:meeting_title::STRING
ORDER BY chunk_metadata:meeting_date::TIMESTAMP_LTZ;
-- View actual chunks for one meeting
SELECT
chunk_id,
chunk_index,
chunk_size,
LEFT(chunk_text, 150) || '...' as chunk_preview
FROM meeting_chunks
WHERE meeting_id = 'MTG-2024-002' -- Database Migration Discussion
ORDER BY chunk_index;
-- ========================================
-- ADAPTIVE CHUNKING (Optional Enhancement)
-- ========================================
-- Adjust chunk size based on meeting duration
TRUNCATE TABLE meeting_chunks;
INSERT INTO meeting_chunks (chunk_id, meeting_id, chunk_index, chunk_text, chunk_size, chunk_metadata)
SELECT
meeting_id || '_chunk_' || chunk_index as chunk_id,
meeting_id,
chunk_index,
chunk_text,
LENGTH(chunk_text) as chunk_size,
chunk_metadata
FROM (
SELECT
mt.meeting_id,
chunk.value::STRING as chunk_text,
chunk.index as chunk_index,
OBJECT_CONSTRUCT(
'meeting_title', mt.meeting_title,
'meeting_date', mt.meeting_date,
'meeting_type', mt.meeting_type,
'attendees', mt.attendees,
'duration_minutes', mt.duration_minutes,
'has_action_items', ARRAY_SIZE(mt.action_items) > 0,
'has_decisions', ARRAY_SIZE(mt.decisions_made) > 0,
'topics', mt.topics_discussed
) as chunk_metadata
FROM meeting_transcripts mt,
LATERAL (
SELECT SNOWFLAKE.CORTEX.SPLIT_TEXT_RECURSIVE_CHARACTER(
mt.transcript_text,
CASE
-- Short meetings: larger chunks
WHEN mt.duration_minutes <= 30 THEN 800
-- Medium meetings: standard chunks
WHEN mt.duration_minutes <= 60 THEN 500
-- Long meetings: smaller chunks for precision
ELSE 400
END
) as chunks_array
) split_result,
LATERAL FLATTEN(input => chunks_array) chunk
);
-- ========================================
-- ENHANCED CHUNKING PROCEDURE (Production)
-- ========================================
-- This creates separate chunks for actions and decisions
CREATE OR REPLACE PROCEDURE chunk_meeting_intelligently(p_meeting_id STRING)
RETURNS STRING
LANGUAGE SQL
AS
$$
BEGIN
-- First, chunk the main transcript
INSERT INTO meeting_chunks (chunk_id, meeting_id, chunk_index, chunk_text, chunk_size, chunk_metadata)
SELECT
p_meeting_id || '_main_' || chunk.index as chunk_id,
p_meeting_id as meeting_id,
chunk.index as chunk_index,
chunk.value::STRING as chunk_text,
LENGTH(chunk.value::STRING) as chunk_size,
OBJECT_CONSTRUCT(
'meeting_title', mt.meeting_title,
'meeting_date', mt.meeting_date,
'meeting_type', mt.meeting_type,
'chunk_type', 'transcript',
'attendees', mt.attendees
) as chunk_metadata
FROM meeting_transcripts mt,
LATERAL (
SELECT SNOWFLAKE.CORTEX.SPLIT_TEXT_RECURSIVE_CHARACTER(
mt.transcript_text,
500
) as chunks_array
) split_result,
LATERAL FLATTEN(input => chunks_array) chunk
WHERE mt.meeting_id = p_meeting_id;
-- Add action items as dedicated chunks
INSERT INTO meeting_chunks (chunk_id, meeting_id, chunk_index, chunk_text, chunk_size, chunk_metadata)
SELECT
p_meeting_id || '_action_' || action.index as chunk_id,
p_meeting_id as meeting_id,
1000 + action.index as chunk_index,
'ACTION ITEM: ' || action.value::STRING as chunk_text,
LENGTH(action.value::STRING) as chunk_size,
OBJECT_CONSTRUCT(
'meeting_title', mt.meeting_title,
'meeting_date', mt.meeting_date,
'chunk_type', 'action_item',
'attendees', mt.attendees
) as chunk_metadata
FROM meeting_transcripts mt,
LATERAL FLATTEN(input => mt.action_items) action
WHERE mt.meeting_id = p_meeting_id
AND ARRAY_SIZE(mt.action_items) > 0;
-- Add decisions as dedicated chunks
INSERT INTO meeting_chunks (chunk_id, meeting_id, chunk_index, chunk_text, chunk_size, chunk_metadata)
SELECT
p_meeting_id || '_decision_' || decision.index as chunk_id,
p_meeting_id as meeting_id,
2000 + decision.index as chunk_index,
'DECISION MADE: ' || decision.value::STRING as chunk_text,
LENGTH(decision.value::STRING) as chunk_size,
OBJECT_CONSTRUCT(
'meeting_title', mt.meeting_title,
'meeting_date', mt.meeting_date,
'chunk_type', 'decision',
'attendees', mt.attendees
) as chunk_metadata
FROM meeting_transcripts mt,
LATERAL FLATTEN(input => mt.decisions_made) decision
WHERE mt.meeting_id = p_meeting_id
AND ARRAY_SIZE(mt.decisions_made) > 0;
RETURN 'Successfully chunked meeting ' || p_meeting_id;
END;
$$;
-- Test the enhanced chunking
CALL chunk_meeting_intelligently('MTG-2024-001');
-- View results by chunk type
SELECT
chunk_id,
chunk_metadata:chunk_type::STRING as type,
chunk_size,
LEFT(chunk_text, 100) || '...' as preview
FROM meeting_chunks
WHERE meeting_id = 'MTG-2024-001'
ORDER BY chunk_index;
-- ========================================
-- QUALITY VALIDATION
-- ========================================
-- Check chunk quality distribution
WITH chunk_stats AS (
SELECT
meeting_id,
chunk_metadata:meeting_title::STRING as meeting_title,
COUNT(*) as total_chunks,
AVG(chunk_size) as avg_size,
STDDEV(chunk_size) as size_variation,
COUNT(CASE WHEN chunk_size < 200 THEN 1 END) as too_small_count,
COUNT(CASE WHEN chunk_size > 1000 THEN 1 END) as too_large_count
FROM meeting_chunks
GROUP BY meeting_id, meeting_title
)
SELECT
meeting_title,
total_chunks,
ROUND(avg_size, 0) as avg_chunk_size,
ROUND(size_variation, 0) as size_std_dev,
too_small_count,
too_large_count,
CASE
WHEN too_small_count > total_chunks * 0.2 THEN 'â ïž Too many small chunks'
WHEN too_large_count > total_chunks * 0.2 THEN 'â ïž Too many large chunks'
ELSE 'â
Good distribution'
END as quality_status
FROM chunk_stats
ORDER BY meeting_id;
For production systems, we might want to split longer meetings into smaller chunksâmaybe 3-5 minute segments or topic-based splits. But for this example, keeping each meeting as one chunk works well since our sample meetings are relatively short.
Step 4: Generate Vector Embeddings
This is where the magic starts. Weâre converting text into mathematical vectors that capture semantic meaning.
-- Generate embeddings using Snowflake Cortex
UPDATE meeting_chunks
SET chunk_embedding = SNOWFLAKE.CORTEX.EMBED_TEXT_1024(
'snowflake-arctic-embed-l',
chunk_text
);
-- Verify embeddings were created
SELECT
chunk_id,
chunk_metadata:meeting_title::STRING as meeting,
chunk_embedding IS NOT NULL as has_embedding,
-- Self-similarity should be 0 (or very close)
VECTOR_L2_DISTANCE(chunk_embedding, chunk_embedding) as self_similarity
FROM meeting_chunks
LIMIT 5;
The snowflake-arctic-embed-l model creates 1024-dimensional vectors. These vectors position similar concepts close together in mathematical spaceâso âAPI securityâ and âauthenticationâ end up near each other, even without sharing exact words.
Step 5: Create Cortex Search Service
Now we set up the search infrastructure. This is what makes semantic search possible.
-- Create Cortex Search Service for meeting notes
CREATE OR REPLACE CORTEX SEARCH SERVICE meeting_search_service
ON chunk_text
WAREHOUSE = meeting_rag_wh
TARGET_LAG = '1 minute'
AS (
SELECT
chunk_id,
chunk_text,
chunk_metadata,
chunk_embedding
FROM meeting_chunks
);
-- Check service status (takes a minute to initialize)
SHOW CORTEX SEARCH SERVICES;
-- Test basic search
SELECT *
FROM TABLE(
meeting_search_service!SEARCH(
QUERY => 'database migration',
LIMIT => 3
)
);
The TARGET_LAG of 1 minute means new meetings get indexed within a minute. For most use cases, this is plenty fast.
Step 6: Building the Query Interface
Now for the exciting partâcreating a function that actually answers questions intelligently.
-- Create the main RAG function
CREATE OR REPLACE FUNCTION ask_meeting_assistant(question STRING)
RETURNS VARIANT
LANGUAGE SQL
AS
$$
WITH relevant_meetings AS (
-- Step 1: Find most relevant meeting chunks
SELECT
chunk_text,
chunk_metadata:meeting_title::STRING as meeting_title,
chunk_metadata:meeting_date::TIMESTAMP_LTZ as meeting_date,
chunk_metadata:attendees as attendees
FROM TABLE(
meeting_search_service!SEARCH(
QUERY => question,
LIMIT => 3
)
)
),
context_builder AS (
-- Step 2: Build context from retrieved meetings
SELECT
LISTAGG(
'Meeting: ' || meeting_title ||
'\nDate: ' || TO_VARCHAR(meeting_date, 'YYYY-MM-DD') ||
'\nAttendees: ' || ARRAY_TO_STRING(attendees, ', ') ||
'\nContent: ' || chunk_text,
'\n\n---\n\n'
) as combined_context,
ARRAY_AGG(
OBJECT_CONSTRUCT(
'title', meeting_title,
'date', meeting_date,
'attendees', attendees
)
) as sources
FROM relevant_meetings
)
-- Step 3: Generate intelligent answer
SELECT OBJECT_CONSTRUCT(
'answer', SNOWFLAKE.CORTEX.COMPLETE(
'llama3.1-70b',
'You are a helpful meeting assistant. Answer the user\'s question based ONLY on the provided meeting notes.
If the information is not in the meeting notes, say so clearly. Always mention which meeting(s) you are referencing and include relevant dates.
If there are action items or decisions related to the question, highlight them.
Meeting Notes:
' || combined_context || '
User Question: ' || question || '
Provide a clear, accurate answer with specific references to meetings and dates:'
),
'sources', sources,
'timestamp', CURRENT_TIMESTAMP()
) as result
FROM context_builder
$$;
-- Test it out!
SELECT ask_meeting_assistant('Why did we choose PostgreSQL?');
-- Parse the result nicely
SELECT
result:answer::STRING as answer,
result:sources as source_meetings,
result:timestamp::TIMESTAMP_LTZ as answered_at
FROM (
SELECT ask_meeting_assistant('Why did we choose PostgreSQL?') as result
);
Letâs break down what this function does:
- Searches for the 3 most relevant meeting chunks semantically
- Builds context by combining those meetings with metadata
- Generates answer using an LLM that has been given specific instructions
- Returns structured output with answer + sources
The key is that the LLM only uses information from our meetingsâit doesnât make things up or use its general knowledge.
Step 7: Real-World Testing
Letâs ask questions that weâd actually ask in real life:
-- Question 1: Specific decision
SELECT
result:answer::STRING as answer
FROM (
SELECT ask_meeting_assistant('Why did we choose PostgreSQL over MySQL?') as result
);
-- Question 2: Action items
SELECT
result:answer::STRING as answer
FROM (
SELECT ask_meeting_assistant('What are Mike\'s pending action items?') as result
);
-- Question 3: Timeline question
SELECT
result:answer::STRING as answer
FROM (
SELECT ask_meeting_assistant('When is the dashboard launch scheduled?') as result
);
-- Question 4: Security discussion
SELECT
result:answer::STRING as answer
FROM (
SELECT ask_meeting_assistant('What security concerns were raised about our API?') as result
);
-- Question 5: Testing boundaries
SELECT
result:answer::STRING as answer
FROM (
SELECT ask_meeting_assistant('What is the weather forecast for tomorrow?') as result
);
-- This should correctly say "not in meeting notes"
Whatâs impressive is how the system can connect information across multiple meetings. If you ask âWhatâs blocking the product launch?â, it can pull from the security meeting AND the planning meeting to give a complete answer.
Step 8: Adding Specialized Query Functions
Different people need different things from meeting notes. Letâs create targeted functions:
-- Function 1: Find action items for a person
CREATE OR REPLACE FUNCTION get_action_items_for(person_name STRING)
RETURNS TABLE (
meeting_title STRING,
meeting_date TIMESTAMP_LTZ,
action_item STRING
)
AS
$$
SELECT
meeting_title,
meeting_date,
action.value::STRING as action_item
FROM meeting_transcripts,
LATERAL FLATTEN(input => action_items) action
WHERE action.value::STRING ILIKE '%' || person_name || '%'
ORDER BY meeting_date DESC
$$;
-- Usage
SELECT * FROM TABLE(get_action_items_for('Mike'));
-- Function 2: Find all decisions in a date range
CREATE OR REPLACE FUNCTION get_decisions_between(
start_date TIMESTAMP_LTZ,
end_date TIMESTAMP_LTZ
)
RETURNS TABLE (
meeting_title STRING,
meeting_date TIMESTAMP_LTZ,
decision STRING
)
AS
$$
SELECT
meeting_title,
meeting_date,
decision.value::STRING as decision
FROM meeting_transcripts,
LATERAL FLATTEN(input => decisions_made) decision
WHERE meeting_date BETWEEN start_date AND end_date
ORDER BY meeting_date DESC
$$;
-- Usage
SELECT * FROM TABLE(get_decisions_between(
'2024-01-01'::TIMESTAMP_LTZ,
'2024-02-28'::TIMESTAMP_LTZ
));
-- Function 3: Search meetings by topic
CREATE OR REPLACE FUNCTION find_meetings_about(topic STRING)
RETURNS TABLE (
meeting_id STRING,
meeting_title STRING,
meeting_date TIMESTAMP_LTZ,
relevance_score FLOAT
)
AS
$$
SELECT
chunk_id as meeting_id,
chunk_metadata:meeting_title::STRING as meeting_title,
chunk_metadata:meeting_date::TIMESTAMP_LTZ as meeting_date,
1.0 as relevance_score -- Cortex Search returns results sorted by relevance
FROM TABLE(
meeting_search_service!SEARCH(
QUERY => topic,
LIMIT => 10
)
)
$$;
-- Usage
SELECT * FROM TABLE(find_meetings_about('API security'));
These specialized functions give your team multiple ways to interact with meeting dataâsome people want natural language, others want structured queries.
Step 9: Building a Conversation History Table
To make this truly useful, we should track what people ask and whether answers were helpful:
-- Track queries and feedback
CREATE OR REPLACE TABLE meeting_assistant_logs (
log_id INTEGER AUTOINCREMENT,
user_name STRING,
question STRING,
answer VARIANT,
sources_used ARRAY,
helpful_vote INTEGER, -- 1 = helpful, -1 = not helpful, NULL = no vote yet
feedback_comment STRING,
queried_at TIMESTAMP_LTZ DEFAULT CURRENT_TIMESTAMP()
);
-- Enhanced function that logs queries
CREATE OR REPLACE PROCEDURE ask_and_log(
user_name STRING,
question STRING
)
RETURNS VARIANT
LANGUAGE SQL
AS
$$
DECLARE
result VARIANT;
BEGIN
-- Get answer
result := ask_meeting_assistant(:question);
-- Log the interaction
INSERT INTO meeting_assistant_logs (user_name, question, answer, sources_used)
SELECT
:user_name,
:question,
:result,
:result:sources
;
RETURN result;
END;
$$;
-- Usage
CALL ask_and_log('[email protected]', 'What did we decide about caching?');
-- View query history
SELECT
log_id,
user_name,
question,
answer:answer::STRING as answer_preview,
helpful_vote,
queried_at
FROM meeting_assistant_logs
ORDER BY queried_at DESC
LIMIT 10;
-- Add feedback
UPDATE meeting_assistant_logs
SET helpful_vote = 1,
feedback_comment = 'Perfect! Found exactly what I needed.'
WHERE log_id = 1;
This logging is crucial for two reasons:
- Accountability: Know whoâs using the system and what theyâre asking
- Improvement: See which queries return unhelpful answers and refine
Step 10: Analytics Dashboard Queries
Letâs create queries that help us understand usage patterns:
-- Most common topics people ask about
SELECT
-- Extract key nouns/topics from questions
LOWER(REGEXP_SUBSTR(question, '\\b(database|API|security|migration|dashboard|launch|PostgreSQL|Redis|cache|decision|action)\\b', 1, 1, 'i')) as topic,
COUNT(*) as question_count
FROM meeting_assistant_logs
WHERE topic IS NOT NULL
GROUP BY topic
ORDER BY question_count DESC
LIMIT 10;
-- User engagement metrics
SELECT
user_name,
COUNT(*) as total_queries,
AVG(CASE WHEN helpful_vote = 1 THEN 1.0 ELSE 0.0 END) as satisfaction_rate,
COUNT(CASE WHEN helpful_vote = -1 THEN 1 END) as unhelpful_answers
FROM meeting_assistant_logs
GROUP BY user_name
ORDER BY total_queries DESC;
-- Meetings that get referenced most
SELECT
source.value:title::STRING as meeting_title,
source.value:date::TIMESTAMP_LTZ as meeting_date,
COUNT(*) as times_referenced
FROM meeting_assistant_logs,
LATERAL FLATTEN(input => sources_used) source
GROUP BY meeting_title, meeting_date
ORDER BY times_referenced DESC
LIMIT 10;
-- Questions that received negative feedback
SELECT
question,
answer:answer::STRING as answer,
feedback_comment,
queried_at
FROM meeting_assistant_logs
WHERE helpful_vote = -1
ORDER BY queried_at DESC;
These analytics tell us whatâs working and what needs improvement. If certain types of questions consistently get negative feedback, we know where to focus our refinement efforts.
Step 11: Automating Meeting Ingestion
In production, weâd want new meetings to automatically flow into the system. Hereâs how that might look:
-- Create stage for incoming meeting transcripts
CREATE OR REPLACE STAGE meeting_uploads
FILE_FORMAT = (TYPE = 'JSON');
-- Create stream to detect new meetings
CREATE OR REPLACE STREAM new_meetings_stream
ON TABLE meeting_transcripts;
-- Task to process new meetings
CREATE OR REPLACE TASK process_new_meetings
WAREHOUSE = meeting_rag_wh
SCHEDULE = '5 MINUTE'
WHEN SYSTEM$STREAM_HAS_DATA('new_meetings_stream')
AS
BEGIN
-- Insert chunks for new meetings
INSERT INTO meeting_chunks (chunk_id, meeting_id, chunk_index, chunk_text, chunk_metadata)
SELECT
meeting_id || '_' || chunk_index as chunk_id,
meeting_id,
chunk_index,
chunk_text,
chunk_metadata
FROM (
SELECT
meeting_id,
transcript_text as chunk_text,
1 as chunk_index,
OBJECT_CONSTRUCT(
'meeting_title', meeting_title,
'meeting_date', meeting_date,
'meeting_type', meeting_type,
'attendees', attendees,
'has_action_items', ARRAY_SIZE(action_items) > 0,
'has_decisions', ARRAY_SIZE(decisions_made) > 0
) as chunk_metadata
FROM new_meetings_stream
WHERE METADATA$ACTION = 'INSERT'
);
-- Generate embeddings for new chunks
UPDATE meeting_chunks
SET chunk_embedding = SNOWFLAKE.CORTEX.EMBED_TEXT_1024(
'snowflake-arctic-embed-l',
chunk_text
)
WHERE chunk_embedding IS NULL;
END;
-- Resume task
ALTER TASK process_new_meetings RESUME;
Now whenever a new meeting gets added to the meeting_transcripts table, itâs automatically chunked, embedded, and searchable within 5 minutes. No manual intervention needed.
Advanced Feature: Topic Extraction
We can use Cortex to automatically extract topics from meetings:
-- Add topics column
ALTER TABLE meeting_transcripts
ADD COLUMN ai_extracted_topics ARRAY;
-- Extract topics using LLM
UPDATE meeting_transcripts
SET ai_extracted_topics = PARSE_JSON(
SNOWFLAKE.CORTEX.COMPLETE(
'llama3.1-70b',
'Extract 3-5 main topics from this meeting transcript. Return ONLY a JSON array of strings, nothing else.
Transcript: ' || transcript_text || '
Format: ["topic1", "topic2", "topic3"]'
)
);
-- View extracted topics
SELECT
meeting_title,
meeting_date,
ai_extracted_topics
FROM meeting_transcripts
ORDER BY meeting_date;
-- Find meetings by AI-extracted topic
SELECT
meeting_title,
meeting_date,
topic.value::STRING as topic
FROM meeting_transcripts,
LATERAL FLATTEN(input => ai_extracted_topics) topic
WHERE topic.value::STRING ILIKE '%security%'
ORDER BY meeting_date DESC;
This auto-tagging makes meetings even more discoverable without manual categorization.
Cost Optimization Tips
Running LLMs on every query can get expensive. Here are strategies to keep costs reasonable:
-- Strategy 1: Response caching
CREATE OR REPLACE TABLE answer_cache (
question_hash STRING PRIMARY KEY,
question STRING,
cached_answer VARIANT,
cached_at TIMESTAMP_LTZ,
cache_hits INTEGER DEFAULT 0
);
-- Modified function with caching
CREATE OR REPLACE FUNCTION ask_with_cache(question STRING)
RETURNS VARIANT
LANGUAGE SQL
AS
$$
SELECT
COALESCE(
-- Try cache first
(SELECT cached_answer
FROM answer_cache
WHERE question_hash = SHA2(LOWER(TRIM(question)))
AND cached_at >= DATEADD(hour, -24, CURRENT_TIMESTAMP())
LIMIT 1),
-- Generate new answer if not cached
ask_meeting_assistant(question)
)
$$;
-- Strategy 2: Smaller model for simple queries
CREATE OR REPLACE FUNCTION ask_simple(question STRING)
RETURNS VARIANT
LANGUAGE SQL
AS
$$
-- Use smaller, cheaper model for straightforward questions
-- Use llama3.1-8b instead of llama3.1-70b for basic lookups
WITH relevant_meetings AS (
SELECT
chunk_text,
chunk_metadata:meeting_title::STRING as meeting_title,
chunk_metadata:meeting_date::TIMESTAMP_LTZ as meeting_date
FROM TABLE(
meeting_search_service!SEARCH(
QUERY => question,
LIMIT => 2 -- Fewer chunks = lower cost
)
)
),
context_builder AS (
SELECT
LISTAGG(
'Meeting: ' || meeting_title || '\n' || chunk_text,
'\n\n---\n\n'
) as combined_context
FROM relevant_meetings
)
SELECT OBJECT_CONSTRUCT(
'answer', SNOWFLAKE.CORTEX.COMPLETE(
'llama3.1-8b', -- Cheaper model
'Answer briefly based on these meeting notes: ' || combined_context ||
'\n\nQuestion: ' || question
)
)
FROM context_builder
$$;
-- Strategy 3: Batch processing for reports
-- Instead of asking individual questions, batch them
CREATE OR REPLACE PROCEDURE generate_weekly_summary()
RETURNS VARIANT
LANGUAGE SQL
AS
$$
DECLARE
result VARIANT;
BEGIN
-- Get all meetings from last week
WITH last_week_meetings AS (
SELECT
LISTAGG(
'Meeting: ' || meeting_title ||
'\nDate: ' || meeting_date ||
'\nKey points: ' || transcript_text,
'\n\n---\n\n'
) as all_meetings
FROM meeting_transcripts
WHERE meeting_date >= DATEADD(week, -1, CURRENT_TIMESTAMP())
)
SELECT OBJECT_CONSTRUCT(
'summary', SNOWFLAKE.CORTEX.SUMMARIZE(all_meetings),
'key_decisions', SNOWFLAKE.CORTEX.COMPLETE(
'llama3.1-70b',
'List all key decisions made in these meetings: ' || all_meetings
),
'action_items', SNOWFLAKE.CORTEX.COMPLETE(
'llama3.1-70b',
'List all action items from these meetings: ' || all_meetings
)
) INTO result
FROM last_week_meetings;
RETURN result;
END;
$$;
These optimization strategies can cut costs by 40-60% while maintaining quality for most queries.
Quality Assurance: Building a Test Suite
We should validate that our RAG system returns accurate answers:
-- Create test cases table
CREATE OR REPLACE TABLE rag_test_cases (
test_id INTEGER AUTOINCREMENT,
test_question STRING,
expected_answer_contains STRING,
expected_meeting_reference STRING,
test_category STRING,
created_at TIMESTAMP_LTZ DEFAULT CURRENT_TIMESTAMP()
);
-- Add test cases based on our sample data
INSERT INTO rag_test_cases (test_question, expected_answer_contains, expected_meeting_reference, test_category)
VALUES
('Why did we choose PostgreSQL?', 'JSON support', 'Database Migration Discussion', 'decision_lookup'),
('What are Mike\'s action items?', 'rate limiting', 'API Security Review', 'action_item_lookup'),
('When is the dashboard launch?', 'February 28', 'Q1 Product Planning', 'timeline_lookup'),
('What security issues were found?', 'rate limiting', 'API Security Review', 'problem_identification'),
('Who attended the planning meeting?', 'Sarah Chen', 'Q1 Product Planning', 'attendee_lookup');
-- Run test suite
CREATE OR REPLACE PROCEDURE run_rag_tests()
RETURNS TABLE (
test_id INTEGER,
question STRING,
passed BOOLEAN,
answer STRING,
reason STRING
)
LANGUAGE SQL
AS
$$
DECLARE
result_cursor CURSOR FOR
WITH test_results AS (
SELECT
t.test_id,
t.test_question,
ask_meeting_assistant(t.test_question) as answer_obj,
t.expected_answer_contains,
t.expected_meeting_reference
FROM rag_test_cases t
)
SELECT
test_id,
test_question as question,
(
answer_obj:answer::STRING ILIKE '%' || expected_answer_contains || '%'
AND ARRAY_TO_STRING(answer_obj:sources, ',') ILIKE '%' || expected_meeting_reference || '%'
) as passed,
answer_obj:answer::STRING as answer,
CASE
WHEN answer_obj:answer::STRING ILIKE '%' || expected_answer_contains || '%' THEN 'Answer contains expected content'
ELSE 'Missing expected content: ' || expected_answer_contains
END as reason
FROM test_results;
BEGIN
OPEN result_cursor;
RETURN TABLE(result_cursor);
END;
$$;
-- Run tests
CALL run_rag_tests();
This automated testing ensures our RAG system maintains quality as we add more meetings and refine prompts.
Monitoring and Alerts
Set up monitoring to catch issues early:
-- Create monitoring table
CREATE OR REPLACE TABLE rag_health_metrics (
metric_date DATE,
total_queries INTEGER,
avg_response_time_seconds FLOAT,
successful_queries INTEGER,
failed_queries INTEGER,
avg_satisfaction_score FLOAT,
unique_users INTEGER
);
-- Daily health check task
CREATE OR REPLACE TASK daily_health_check
WAREHOUSE = meeting_rag_wh
SCHEDULE = 'USING CRON 0 8 * * * America/Los_Angeles' -- 8 AM daily
AS
INSERT INTO rag_health_metrics
SELECT
CURRENT_DATE() as metric_date,
COUNT(*) as total_queries,
AVG(DATEDIFF(second, queried_at, CURRENT_TIMESTAMP())) as avg_response_time_seconds,
COUNT(CASE WHEN answer IS NOT NULL THEN 1 END) as successful_queries,
COUNT(CASE WHEN answer IS NULL THEN 1 END) as failed_queries,
AVG(CASE WHEN helpful_vote = 1 THEN 1.0
WHEN helpful_vote = -1 THEN 0.0
ELSE NULL END) as avg_satisfaction_score,
COUNT(DISTINCT user_name) as unique_users
FROM meeting_assistant_logs
WHERE queried_at >= DATEADD(day, -1, CURRENT_DATE());
-- Resume health check task
ALTER TASK daily_health_check RESUME;
-- Alert query (run this manually or set up notifications)
SELECT
metric_date,
total_queries,
avg_satisfaction_score,
CASE
WHEN avg_satisfaction_score < 0.7 THEN 'â ïž LOW SATISFACTION'
WHEN failed_queries > total_queries * 0.1 THEN 'â ïž HIGH FAILURE RATE'
ELSE 'â
HEALTHY'
END as status
FROM rag_health_metrics
WHERE metric_date >= DATEADD(day, -7, CURRENT_DATE())
ORDER BY metric_date DESC;
Advanced: Multi-Turn Conversations
Right now, each question is independent. We can add conversation context:
-- Table to track conversation sessions
CREATE OR REPLACE TABLE conversation_sessions (
session_id STRING PRIMARY KEY,
user_name STRING,
started_at TIMESTAMP_LTZ DEFAULT CURRENT_TIMESTAMP(),
last_interaction TIMESTAMP_LTZ DEFAULT CURRENT_TIMESTAMP(),
conversation_history ARRAY
);
-- Function with conversation memory
CREATE OR REPLACE FUNCTION ask_with_context(
session_id STRING,
user_name STRING,
current_question STRING
)
RETURNS VARIANT
LANGUAGE SQL
AS
$$
WITH session_data AS (
SELECT
COALESCE(conversation_history, ARRAY_CONSTRUCT()) as history
FROM conversation_sessions
WHERE session_id = session_id
LIMIT 1
),
enhanced_question AS (
SELECT
CASE
WHEN ARRAY_SIZE(history) > 0 THEN
current_question || ' (Context from previous questions: ' ||
ARRAY_TO_STRING(history, ', ') || ')'
ELSE current_question
END as full_question
FROM session_data
)
SELECT ask_meeting_assistant(full_question)
FROM enhanced_question
$$;
This allows follow-up questions like:
- User: âWhat did we decide about the database?â
- System: âWe chose PostgreSQLâŠâ
- User: âWhy that over MySQL?â â understands âthatâ refers to PostgreSQL
Example Use Cases in Action
Letâs see how different teams would use this:
Engineering Team:
-- Find all technical decisions
SELECT
result:answer::STRING as answer
FROM (
SELECT ask_meeting_assistant(
'What technical decisions were made in the last month?'
) as result
);
-- Get status of specific feature
SELECT
result:answer::STRING as answer
FROM (
SELECT ask_meeting_assistant(
'What is the current status of the Redis implementation?'
) as result
);
Product Team:
-- Customer feedback summary
SELECT
result:answer::STRING as answer
FROM (
SELECT ask_meeting_assistant(
'What customer feedback have we received about the dashboard?'
) as result
);
-- Feature prioritization
SELECT
result:answer::STRING as answer
FROM (
SELECT ask_meeting_assistant(
'What features are prioritized for Q2?'
) as result
);
Management:
-- Project blockers
SELECT
result:answer::STRING as answer
FROM (
SELECT ask_meeting_assistant(
'What are the current blockers for the product launch?'
) as result
);
-- Team action items
SELECT * FROM TABLE(get_action_items_for('engineering team'));
Extending to Other Data Sources
The beautiful thing about this architecture is itâs extensible. We can add other knowledge sources:
-- Add Jira tickets
CREATE TABLE jira_issues (
issue_key STRING,
summary STRING,
description STRING,
status STRING,
created_date TIMESTAMP_LTZ
);
-- Add Slack discussions
CREATE TABLE slack_threads (
thread_id STRING,
channel_name STRING,
message_text STRING,
posted_at TIMESTAMP_LTZ
);
-- Add Confluence docs
CREATE TABLE wiki_pages (
page_id STRING,
title STRING,
content STRING,
last_updated TIMESTAMP_LTZ
);
-- Unified search across all sources
CREATE OR REPLACE FUNCTION ask_everything(question STRING)
RETURNS VARIANT
LANGUAGE SQL
AS
$$
-- Implementation would search across all sources
-- and combine results intelligently
$$;
Now your RAG system becomes a true organizational knowledge hub.
What Weâve Built
Letâs recap what this system can do:
â
Natural language search across all meeting notes
â
Intelligent answers with source citations
â
Action item tracking by person and date
â
Decision history with full context
â
Topic extraction and categorization
â
Multi-turn conversations with context memory
â
Automated ingestion of new meetings
â
Quality monitoring and testing
â
Cost optimization through caching
â
Integration ready for Slack, Teams, etc.
And all of this runs entirely within Snowflakeâno external services, no data movement, no infrastructure headaches.
The Real Value Proposition
Hereâs what changes when we have a system like this:
Before:
- Someone asks âWhat did we decide about X?â
- You spend 20 minutes searching through meeting notes
- You find partial information across 3 different meetings
- You piece together an answer, but youâre not 100% sure
- Total time wasted: 20+ minutes
After:
- Someone asks âWhat did we decide about X?â
- You type the question into the assistant
- Get a complete answer with sources in 2 seconds
- Click through to verify if needed
- Total time: 30 seconds
Thatâs a 40x improvement. Multiply that across your entire team, every day, and the ROI becomes obvious.
Future Enhancements We Can Build
This is just the foundation. Here are ideas for taking it further:
- Automatic summaries: Email digest every Monday with key decisions from last week
- Proactive alerts: âYou have 3 action items due this weekâ
- Meeting preparation: âHereâs what was discussed last time you met with this teamâ
- Trend analysis: âDatabase migration has been mentioned 15 times this monthâ
- Sentiment tracking: Detect when team morale shifts in meetings
- Smart reminders: âYou committed to X in the meeting but havenât updated statusâ
The possibilities are endless once we have meeting data structured and searchable.
Final Thoughts
The technology for this has existed for years, but whatâs changed is how accessible itâs become. Building a RAG system used to require:
- Deep ML expertise
- Complex infrastructure
- Weeks of development time
- Ongoing maintenance burden
Now, with Snowflake Cortex, we can build it in a weekend using SQL. Thatâs the real revolutionânot the technology itself, but the democratization of it.
Every organization has the same problem: valuable knowledge trapped in meeting notes that nobody ever looks at again. Weâve just seen how to solve that problem in a practical, maintainable way.
The question isnât âCan we build this?â anymore. Itâs âWhen do we start?â
Complete Setup Script : Github Repo
Hereâs everything in one place to get started:
-- Complete setup script for Meeting Notes RAG
-- Run this entire script to set up the system
-- 1. Database setup
CREATE DATABASE IF NOT EXISTS meeting_intelligence;
USE DATABASE meeting_intelligence;
CREATE SCHEMA IF NOT EXISTS meetings;
USE SCHEMA meetings;
-- 2. Compute
CREATE WAREHOUSE IF NOT EXISTS meeting_rag_wh
WITH WAREHOUSE_SIZE = 'SMALL'
AUTO_SUSPEND = 60
AUTO_RESUME = TRUE;
USE WAREHOUSE meeting_rag_wh;
-- 3. Main tables
CREATE OR REPLACE TABLE meeting_transcripts (
meeting_id STRING PRIMARY KEY,
meeting_title STRING NOT NULL,
meeting_date TIMESTAMP_LTZ NOT NULL,
meeting_type STRING,
attendees ARRAY,
transcript_text STRING NOT NULL,
action_items ARRAY,
decisions_made ARRAY,
topics_discussed ARRAY
);
CREATE OR REPLACE TABLE meeting_chunks (
chunk_id STRING PRIMARY KEY,
meeting_id STRING,
chunk_text STRING,
chunk_metadata VARIANT,
chunk_embedding VECTOR(FLOAT, 1024)
);
-- 4. Load sample data (use the INSERT statements from earlier)
-- 5. Create embeddings and search service
UPDATE meeting_chunks
SET chunk_embedding = SNOWFLAKE.CORTEX.EMBED_TEXT_1024(
'snowflake-arctic-embed-l',
chunk_text
);
CREATE OR REPLACE CORTEX SEARCH SERVICE meeting_search_service
ON chunk_text
WAREHOUSE = meeting_rag_wh
TARGET_LAG = '1 minute'
AS (
SELECT chunk_id, chunk_text, chunk_metadata, chunk_embedding
FROM meeting_chunks
);
-- 6. Create query function (use function from earlier)
-- 7. Test it!
SELECT ask_meeting_assistant('What are the current project blockers?');
Additional Resources
Documentation Links:
Want to Learn More?
This is just scratching the surface of whatâs possible with RAG systems in Snowflake. The same principles apply to:
- Customer support ticket analysis
- Documentation search
- Code repository search
- Email analysis
- Contract review
- Research paper analysis
The foundation weâve built here can be adapted to any text-based knowledge base.