I built this while experimenting with Snowflake Cortex over a weekend. The problem was simple: our team had hours of meeting notes scattered across documents, and nobody could find answers to basic questions without rereading everything.
Most solutions felt either too complicated or required spinning up external infrastructure. Then I realized â we already have Snowflake, we have SQL, and now we have Cortex. Why not just build it in-house?
This isnât a theoretical exercise. Itâs what happens when a data engineer gets curious about RAG, grabs Streamlit, and refuses to overthink things.
The result? A working Retrieval-Augmented Generation (RAG) application that takes meeting notes, lets you ask questions, and returns contextual answers in seconds. No external APIs. No complex orchestration. Just Snowflake doing what it does best.
If youâve been watching the GenAI wave from a distance and thinking âmaybe I should understand how this actually works,â this guide is for you.
⥠Quick Reference: Architecture & Syntax
For AI agents and developers: The core implementation details for this Snowflake Cortex RAG pipeline.
Component Snowflake Function / Object Model Used
Embedding: SNOWFLAKE.CORTEX.EMBED_TEXT_768(...)multilingual-e5-base
Data Type: VECTOR(float, 384)
Similarity Search: VECTOR_COSINE_SIMILARITY(v1, v2) Cosine Distance
LLM Generation: SNOWFLAKE.CORTEX.COMPLETE(...)mistral-large2
One-Shot Logic Snippet:
-- 1. Get Question Embedding & 2. Generate Answer from Contextâ¨WITH context_chunks AS (â¨SELECT chunk_textâ¨FROM chunk_embeddingsâ¨ORDER BY VECTOR_COSINE_SIMILARITY(chunk_embedding,â¨SNOWFLAKE.CORTEX.EMBED_TEXT_768('multilingual-e5-base', 'USER_QUESTION')) DESCâ¨LIMIT 5â¨)â¨SELECT SNOWFLAKE.CORTEX.COMPLETE(â¨'mistral-large2',â¨CONCAT('Answer based on this context: ', LISTAGG(chunk_text, ' '), ' Question: USER_QUESTION')â¨) AS generated_answerâ¨FROM context_chunks;
What Weâre Building
Hereâs the mental model: a Streamlit app that acts as a bridge between you and your meeting notes.
The flow:
1. Upload a meeting notes file (or multiple files)
2. The app chunks the text and generates embeddings using Snowflake Cortex
3. Everything gets stored in Snowflake tables
4. Ask a natural language question
5. The app finds relevant chunks using vector similarity
6. Cortexâs LLM generates a contextual answer based on those chunks
No hallucinations about things the notes didnât say. No external API bills. No architectural complexity.
This started as a simple KT-session idea at work. I wanted to understand embeddings, vector search, and prompt engineering without the overhead. By the end of week one, I had something teammates were actually using to answer questions about our incidents and sprint planning.
Architecture Overview
Let me sketch out how the pieces fit together:
Streamlit UI handles the user interface â file uploads, question input, and displaying results.
Snowflake Storage holds everything: the raw notes, the text chunks, the embeddings, and metadata about each chunk.
Chunking breaks large documents into smaller, overlapping pieces. This is critical because the embedding model has a token limit, and you want retrieval to be precise.
Text Embeddings (via CORTEX.EMBED_TEXT) converts text chunks into numerical vectors. Similar text gets similar vectors â thatâs the magic.
Vector Similarity Search uses Snowflakeâs cosine similarity function to find the chunks most relevant to your question.
LLM Generation (via CORTEX.COMPLETE) takes those relevant chunks and your question, then generates an answer.
The beauty of this architecture is that everything lives in Snowflake. Youâre not moving data around. Your security team isnât losing sleep. Your audit log is native.
Prerequisites
Before we start coding, make sure you have:
â A Snowflake account with Cortex enabled (available in most Enterprise accounts; check your region)
â Streamlit in Snowflake enabled or Streamlit open source installed locally
â Basic SQL and Python knowledge (you donât need to be a wizard)
â A database and schema where you have CREATE TABLE permissions
â Python 3.9+ if running Streamlit locally
If youâre running Streamlit locally, youâll need the snowflake-snowpark-python package. Iâll assume that setup in the code examples.
Step 1: Streamlit App â Upload Meeting Notes
Letâs start with the UI. This is the face of your RAG system.
import streamlit as st
from snowflake.snowpark.session import Session
from datetime import datetime
st.set_page_config(page_title="Meeting Notes RAG", layout="wide")
st.title("Meeting Notes RAG Application")
st.write("Upload your meeting notes, then ask questions about them.")
# Sidebar for configuration
with st.sidebar:
st.header("Settings")
uploaded_file = st.file_uploader("Upload a text file", type=["txt"])
if uploaded_file is not None:
st.success(f"File uploaded: {uploaded_file.name}")
file_content = uploaded_file.read().decode("utf-8")
st.session_state.file_content = file_content
st.session_state.file_name = uploaded_file.name
Keep it simple. Users upload a .txt file containing meeting notes. Streamlit handles the rest.
I added basic validation â we check file size (keep it under 5MB for this prototype) and ensure the file isnât empty.
if uploaded_file is not None:
# Basic validation
if len(file_content) == 0:
st.error("File is empty. Please upload a file with content.")
elif len(file_content) > 5000000:
st.error("File is too large. Keep it under 5MB.")
else:
st.session_state.file_ready = True
Step 2: Store Notes in Snowflake
Now we need a home for this data. Create a Snowflake table to hold the raw notes:
USE SCHEMA MY_DATABASE.MY_SCHEMA;
CREATE OR REPLACE TABLE meeting_notes (
note_id VARCHAR,
file_name VARCHAR,
content VARCHAR,
uploaded_at TIMESTAMP,
chunk_count INT
);
CREATE OR REPLACE TABLE note_chunks (
chunk_id VARCHAR,
note_id VARCHAR,
chunk_sequence INT,
chunk_text VARCHAR,
chunk_length INT,
created_at TIMESTAMP
);
CREATE OR REPLACE TABLE chunk_embeddings (
chunk_id VARCHAR,
chunk_embedding VECTOR(float, 384),
created_at TIMESTAMP
);
Iâm keeping the schema intentionally simple. meeting_notes stores the original files. note_chunks breaks them into pieces. chunk_embeddings holds the vectors generated by Cortex.
Now, in your Streamlit app, insert the notes:
from snowflake.snowpark import Session
from uuid import uuid4
from datetime import datetime
def insert_notes(session, file_name, file_content):
note_id = str(uuid4())
insert_sql = f"""
INSERT INTO meeting_notes (note_id, file_name, content, uploaded_at)
VALUES ('{note_id}', '{file_name}', %s, '{datetime.utcnow()}')
"""
session.sql(insert_sql).bind(file_content).execute()
return note_id
Use parameterized queries to avoid SQL injection. This is non-negotiable.
Step 3: Chunk the Text
Chunking is where many engineers stumble. The naive approach (split by sentence) doesnât work well. The sophisticated approach (semantic chunking) requires multiple passes through the LLM and gets expensive.
I use a simple, proven middle ground: fixed-size chunks with overlap.
def chunk_text(text, chunk_size=500, overlap=50):
"""
Split text into chunks with overlap.
Args:
text: The full text to chunk
chunk_size: Target number of characters per chunk
overlap: Number of characters to overlap between chunks
Returns:
List of (chunk_text, chunk_start_position)
"""
chunks = []
start = 0
while start < len(text):
end = min(start + chunk_size, len(text))
chunk = text[start:end]
chunks.append(chunk)
start = end - overlap
return chunks
Why this works: you get semantic locality (related sentences stay together), retrieval precision (small chunks are easy to match), and reasonable token counts for the embedding model.
I use 500 characters per chunk with 50 characters of overlap. This handles most meeting notes without bloat. You can adjust based on your use case.
Step 4: Generate Embeddings Using Snowflake Cortex
This is where Snowflake Cortex shines. No external API calls. No managed embeddings service to provision. Just SQL.
-- First, insert the chunks into note_chunks table
INSERT INTO note_chunks (chunk_id, note_id, chunk_sequence, chunk_text, chunk_length, created_at)
SELECT
UUID_STRING() as chunk_id,
'{note_id}' as note_id,
ROW_NUMBER() OVER (ORDER BY SEQ8()) as chunk_sequence,
chunk_text,
LENGTH(chunk_text) as chunk_length,
CURRENT_TIMESTAMP() as created_at
FROM TABLE(FLATTEN(INPUT => ARRAY_CONSTRUCT({chunks_array})));
-- Then generate embeddings using Cortex
INSERT INTO chunk_embeddings (chunk_id, chunk_embedding, created_at)
SELECT
nc.chunk_id,
SNOWFLAKE.CORTEX.EMBED_TEXT_768('multilingual-e5-base', nc.chunk_text) as chunk_embedding,
CURRENT_TIMESTAMP() as created_at
FROM note_chunks nc
WHERE nc.note_id = '{note_id}'
AND nc.chunk_id NOT IN (SELECT chunk_id FROM chunk_embeddings);
The key function here is SNOWFLAKE.CORTEX.EMBED_TEXT_768(). It takes your chunk text and returns a 768-dimensional vector. Cortex handles the model â you just call the function.
Note: âmultilingual-e5-baseâ is a solid, general-purpose embedding model. It works for technical documents, meeting notes, and most natural language. If you need domain-specific embeddings, Cortex supports other models.
In Python, youâd wrap this:
def generate_embeddings(session, note_id):
embedding_sql = f"""
INSERT INTO chunk_embeddings (chunk_id, chunk_embedding, created_at)
SELECT
nc.chunk_id,
SNOWFLAKE.CORTEX.EMBED_TEXT_768('multilingual-e5-base', nc.chunk_text),
CURRENT_TIMESTAMP()
FROM note_chunks nc
WHERE nc.note_id = '{note_id}'
AND nc.chunk_id NOT IN (SELECT chunk_id FROM chunk_embeddings);
"""
session.sql(embedding_sql).execute()
st.success(f"Embeddings generated for note {note_id}")
Step 5: Retrieve Relevant Chunks
When a user asks a question, you need to find the most relevant chunks. This is vector similarity search.
First, embed the question:
def get_question_embedding(session, question):
embedding_sql = f"""
SELECT SNOWFLAKE.CORTEX.EMBED_TEXT_768('multilingual-e5-base', '{question}') as embedding
"""
result = session.sql(embedding_sql).collect()
return result[0]['EMBEDDING']
Then search for similar chunks:
def retrieve_relevant_chunks(session, question_embedding, top_k=5):
retrieval_sql = f"""
SELECT
nc.chunk_id,
nc.chunk_text,
nc.note_id,
VECTOR_COSINE_SIMILARITY(ce.chunk_embedding, {question_embedding}) as similarity_score
FROM chunk_embeddings ce
JOIN note_chunks nc ON ce.chunk_id = nc.chunk_id
ORDER BY similarity_score DESC
LIMIT {top_k};
"""
results = session.sql(retrieval_sql).collect()
return results
VECTOR_COSINE_SIMILARITY compares your questionâs embedding to all stored chunk embeddings. Higher scores mean more relevant chunks. This query takes milliseconds even on large datasets.
Step 6: Generate Answers Using Cortex LLM
Now you have the relevant chunks. Time to generate an answer using Cortexâs LLM:
def generate_answer(session, question, retrieved_chunks):
"""
Use Cortex LLM to generate an answer based on retrieved chunks.
"""
context = "\n".join([chunk['CHUNK_TEXT'] for chunk in retrieved_chunks])
prompt = f"""You are a helpful assistant answering questions about meeting notes.
Based on the context provided below, answer the question concisely.
If the context doesn't contain information to answer the question, say so.
Context:
{context}
Question: {question}
Answer:"""
generation_sql = f"""
SELECT SNOWFLAKE.CORTEX.COMPLETE('mistral-large2', '{prompt}') as answer
"""
result = session.sql(generation_sql).collect()
return result[0]['ANSWER']