I built this while experimenting with Snowflake Cortex over a weekend. The problem was simple: our team had hours of meeting notes scattered across documents, and nobody could find answers to basic questions without rereading everything.
Most solutions felt either too complicated or required spinning up external infrastructure. Then I realized — we already have Snowflake, we have SQL, and now we have Cortex. Why not just build it in-house?


This isn’t a theoretical exercise. It’s what happens when a data engineer gets curious about RAG, grabs Streamlit, and refuses to overthink things.


The result? A working Retrieval-Augmented Generation (RAG) application that takes meeting notes, lets you ask questions, and returns contextual answers in seconds. No external APIs. No complex orchestration. Just Snowflake doing what it does best.
If you’ve been watching the GenAI wave from a distance and thinking “maybe I should understand how this actually works,” this guide is for you.


⚡ Quick Reference: Architecture & Syntax

For AI agents and developers: The core implementation details for this Snowflake Cortex RAG pipeline.

Component Snowflake Function / Object Model Used

Embedding: SNOWFLAKE.CORTEX.EMBED_TEXT_768(...)multilingual-e5-base

Data Type: VECTOR(float, 384)

Similarity Search: VECTOR_COSINE_SIMILARITY(v1, v2) Cosine Distance

LLM Generation: SNOWFLAKE.CORTEX.COMPLETE(...)mistral-large2

One-Shot Logic Snippet:

-- 1. Get Question Embedding & 2. Generate Answer from Context
WITH context_chunks AS (
SELECT chunk_text
FROM chunk_embeddings
ORDER BY VECTOR_COSINE_SIMILARITY(chunk_embedding,
SNOWFLAKE.CORTEX.EMBED_TEXT_768('multilingual-e5-base', 'USER_QUESTION')) DESC
LIMIT 5
)
SELECT SNOWFLAKE.CORTEX.COMPLETE(
'mistral-large2',
CONCAT('Answer based on this context: ', LISTAGG(chunk_text, ' '), ' Question: USER_QUESTION')
) AS generated_answer
FROM context_chunks;


What We’re Building
Here’s the mental model: a Streamlit app that acts as a bridge between you and your meeting notes.


The flow:
1. Upload a meeting notes file (or multiple files)
2. The app chunks the text and generates embeddings using Snowflake Cortex
3. Everything gets stored in Snowflake tables
4. Ask a natural language question
5. The app finds relevant chunks using vector similarity
6. Cortex’s LLM generates a contextual answer based on those chunks
No hallucinations about things the notes didn’t say. No external API bills. No architectural complexity.


This started as a simple KT-session idea at work. I wanted to understand embeddings, vector search, and prompt engineering without the overhead. By the end of week one, I had something teammates were actually using to answer questions about our incidents and sprint planning.


Architecture Overview
Let me sketch out how the pieces fit together:


Streamlit UI handles the user interface — file uploads, question input, and displaying results.
Snowflake Storage holds everything: the raw notes, the text chunks, the embeddings, and metadata about each chunk.
Chunking breaks large documents into smaller, overlapping pieces. This is critical because the embedding model has a token limit, and you want retrieval to be precise.
Text Embeddings (via CORTEX.EMBED_TEXT) converts text chunks into numerical vectors. Similar text gets similar vectors — that’s the magic.


Vector Similarity Search uses Snowflake’s cosine similarity function to find the chunks most relevant to your question.


LLM Generation (via CORTEX.COMPLETE) takes those relevant chunks and your question, then generates an answer.
The beauty of this architecture is that everything lives in Snowflake. You’re not moving data around. Your security team isn’t losing sleep. Your audit log is native.
Prerequisites


Before we start coding, make sure you have:
∙ A Snowflake account with Cortex enabled (available in most Enterprise accounts; check your region)
∙ Streamlit in Snowflake enabled or Streamlit open source installed locally
∙ Basic SQL and Python knowledge (you don’t need to be a wizard)
∙ A database and schema where you have CREATE TABLE permissions
∙ Python 3.9+ if running Streamlit locally
If you’re running Streamlit locally, you’ll need the snowflake-snowpark-python package. I’ll assume that setup in the code examples.
Step 1: Streamlit App – Upload Meeting Notes
Let’s start with the UI. This is the face of your RAG system.

import streamlit as st
from snowflake.snowpark.session import Session
from datetime import datetime
st.set_page_config(page_title="Meeting Notes RAG", layout="wide")
st.title("Meeting Notes RAG Application")
st.write("Upload your meeting notes, then ask questions about them.")
# Sidebar for configuration
with st.sidebar:
    st.header("Settings")
    uploaded_file = st.file_uploader("Upload a text file", type=["txt"])
    
    if uploaded_file is not None:
        st.success(f"File uploaded: {uploaded_file.name}")
        file_content = uploaded_file.read().decode("utf-8")
        st.session_state.file_content = file_content
        st.session_state.file_name = uploaded_file.name

Keep it simple. Users upload a .txt file containing meeting notes. Streamlit handles the rest.
I added basic validation — we check file size (keep it under 5MB for this prototype) and ensure the file isn’t empty.

if uploaded_file is not None:
    # Basic validation
    if len(file_content) == 0:
        st.error("File is empty. Please upload a file with content.")
    elif len(file_content) > 5000000:
        st.error("File is too large. Keep it under 5MB.")
    else:
        st.session_state.file_ready = True

Step 2: Store Notes in Snowflake
Now we need a home for this data. Create a Snowflake table to hold the raw notes:

USE SCHEMA MY_DATABASE.MY_SCHEMA;
CREATE OR REPLACE TABLE meeting_notes (
    note_id VARCHAR,
    file_name VARCHAR,
    content VARCHAR,
    uploaded_at TIMESTAMP,
    chunk_count INT
);
CREATE OR REPLACE TABLE note_chunks (
    chunk_id VARCHAR,
    note_id VARCHAR,
    chunk_sequence INT,
    chunk_text VARCHAR,
    chunk_length INT,
    created_at TIMESTAMP
);
CREATE OR REPLACE TABLE chunk_embeddings (
    chunk_id VARCHAR,
    chunk_embedding VECTOR(float, 384),
    created_at TIMESTAMP
);

I’m keeping the schema intentionally simple. meeting_notes stores the original files. note_chunks breaks them into pieces. chunk_embeddings holds the vectors generated by Cortex.


Now, in your Streamlit app, insert the notes:

from snowflake.snowpark import Session
from uuid import uuid4
from datetime import datetime
def insert_notes(session, file_name, file_content):
    note_id = str(uuid4())
    
    insert_sql = f"""
    INSERT INTO meeting_notes (note_id, file_name, content, uploaded_at)
    VALUES ('{note_id}', '{file_name}', %s, '{datetime.utcnow()}')
    """
    
    session.sql(insert_sql).bind(file_content).execute()
    return note_id

Use parameterized queries to avoid SQL injection. This is non-negotiable.


Step 3: Chunk the Text
Chunking is where many engineers stumble. The naive approach (split by sentence) doesn’t work well. The sophisticated approach (semantic chunking) requires multiple passes through the LLM and gets expensive.
I use a simple, proven middle ground: fixed-size chunks with overlap.

def chunk_text(text, chunk_size=500, overlap=50):
    """
    Split text into chunks with overlap.
    
    Args:
        text: The full text to chunk
        chunk_size: Target number of characters per chunk
        overlap: Number of characters to overlap between chunks
    
    Returns:
        List of (chunk_text, chunk_start_position)
    """
    chunks = []
    start = 0
    
    while start < len(text):
        end = min(start + chunk_size, len(text))
        chunk = text[start:end]
        chunks.append(chunk)
        start = end - overlap
    
    return chunks

Why this works: you get semantic locality (related sentences stay together), retrieval precision (small chunks are easy to match), and reasonable token counts for the embedding model.
I use 500 characters per chunk with 50 characters of overlap. This handles most meeting notes without bloat. You can adjust based on your use case.
Step 4: Generate Embeddings Using Snowflake Cortex
This is where Snowflake Cortex shines. No external API calls. No managed embeddings service to provision. Just SQL.

-- First, insert the chunks into note_chunks table
INSERT INTO note_chunks (chunk_id, note_id, chunk_sequence, chunk_text, chunk_length, created_at)
SELECT 
    UUID_STRING() as chunk_id,
    '{note_id}' as note_id,
    ROW_NUMBER() OVER (ORDER BY SEQ8()) as chunk_sequence,
    chunk_text,
    LENGTH(chunk_text) as chunk_length,
    CURRENT_TIMESTAMP() as created_at
FROM TABLE(FLATTEN(INPUT => ARRAY_CONSTRUCT({chunks_array})));
-- Then generate embeddings using Cortex
INSERT INTO chunk_embeddings (chunk_id, chunk_embedding, created_at)
SELECT 
    nc.chunk_id,
    SNOWFLAKE.CORTEX.EMBED_TEXT_768('multilingual-e5-base', nc.chunk_text) as chunk_embedding,
    CURRENT_TIMESTAMP() as created_at
FROM note_chunks nc
WHERE nc.note_id = '{note_id}'
AND nc.chunk_id NOT IN (SELECT chunk_id FROM chunk_embeddings);

The key function here is SNOWFLAKE.CORTEX.EMBED_TEXT_768(). It takes your chunk text and returns a 768-dimensional vector. Cortex handles the model — you just call the function.
Note: ‘multilingual-e5-base’ is a solid, general-purpose embedding model. It works for technical documents, meeting notes, and most natural language. If you need domain-specific embeddings, Cortex supports other models.
In Python, you’d wrap this:

def generate_embeddings(session, note_id):
    embedding_sql = f"""
    INSERT INTO chunk_embeddings (chunk_id, chunk_embedding, created_at)
    SELECT 
        nc.chunk_id,
        SNOWFLAKE.CORTEX.EMBED_TEXT_768('multilingual-e5-base', nc.chunk_text),
        CURRENT_TIMESTAMP()
    FROM note_chunks nc
    WHERE nc.note_id = '{note_id}'
    AND nc.chunk_id NOT IN (SELECT chunk_id FROM chunk_embeddings);
    """
    session.sql(embedding_sql).execute()
    st.success(f"Embeddings generated for note {note_id}")

Step 5: Retrieve Relevant Chunks
When a user asks a question, you need to find the most relevant chunks. This is vector similarity search.
First, embed the question:

def get_question_embedding(session, question):
    embedding_sql = f"""
    SELECT SNOWFLAKE.CORTEX.EMBED_TEXT_768('multilingual-e5-base', '{question}') as embedding
    """
    result = session.sql(embedding_sql).collect()
    return result[0]['EMBEDDING']

Then search for similar chunks:

def retrieve_relevant_chunks(session, question_embedding, top_k=5):
    retrieval_sql = f"""
    SELECT 
        nc.chunk_id,
        nc.chunk_text,
        nc.note_id,
        VECTOR_COSINE_SIMILARITY(ce.chunk_embedding, {question_embedding}) as similarity_score
    FROM chunk_embeddings ce
    JOIN note_chunks nc ON ce.chunk_id = nc.chunk_id
    ORDER BY similarity_score DESC
    LIMIT {top_k};
    """
    
    results = session.sql(retrieval_sql).collect()
    return results

VECTOR_COSINE_SIMILARITY compares your question’s embedding to all stored chunk embeddings. Higher scores mean more relevant chunks. This query takes milliseconds even on large datasets.
Step 6: Generate Answers Using Cortex LLM
Now you have the relevant chunks. Time to generate an answer using Cortex’s LLM:

def generate_answer(session, question, retrieved_chunks):
    """
    Use Cortex LLM to generate an answer based on retrieved chunks.
    """
    context = "\n".join([chunk['CHUNK_TEXT'] for chunk in retrieved_chunks])
    
    prompt = f"""You are a helpful assistant answering questions about meeting notes.
Based on the context provided below, answer the question concisely.
If the context doesn't contain information to answer the question, say so.
Context:
{context}
Question: {question}
Answer:"""
    
    generation_sql = f"""
    SELECT SNOWFLAKE.CORTEX.COMPLETE('mistral-large2', '{prompt}') as answer
    """
    
    result = session.sql(generation_sql).collect()
    return result[0]['ANSWER']