AdvancedLast updated: 2026-04-27 • 4 sections
Expert questions on Cortex LLM functions, ML functions, embeddings, fine-tuning, and AI apps in Snowflake.
Q: What are Cortex LLM functions and how do they differ from external LLM APIs?
Cortex functions (COMPLETE, SUMMARIZE, TRANSLATE, SENTIMENT) run Llama and Mistral models inside Snowflake. Data never leaves the account, they are called via SQL, billed per-token as credits, governed by existing RBAC and masking policies, and GPU scaling is fully managed. Key trade-off: no GPT-4 class model. For enterprise interviews, lead with governance — data residency outweighs model variety on most regulated workloads.
Q: How do you pass temperature, token limits, and system prompts to COMPLETE?
Simple form: COMPLETE(model, prompt). Advanced form: pass an options object — { messages: [{role: "system", content: "..."}, {role: "user", content: "..."}], temperature: 0.2, max_tokens: 512 }. Batch processing: SELECT COMPLETE(model, input_col) FROM table runs inference on every row. Always test on LIMIT 100 first — cost scales linearly with row count.
Q: How do you build a RAG pipeline entirely inside Snowflake?
Step 1: chunk documents and embed with EMBED_TEXT_768, store chunks in a table with a VECTOR(FLOAT, 768) column. Step 2: for each query, embed the question and retrieve top-K via VECTOR_COSINE_SIMILARITY. Step 3: pass retrieved chunks and the question to COMPLETE. For production, prefer Cortex Search Service — it manages chunking, embedding, and index maintenance automatically.
Q: What embedding models are available and how do you choose one?
e5-base-v2: general purpose, lower latency. multilingual-e5-large: 100+ language coverage. snowflake-arctic-embed: highest English retrieval quality. Choose by language requirements, retrieval quality vs latency budget, and storage (768-dim vs 1024-dim). Store vectors in a VECTOR(FLOAT, 768) or VECTOR(FLOAT, 1024) column type.
Q: How do you control Cortex LLM costs in production?
Use the smallest model that meets the quality bar. Set max_tokens in COMPLETE options. Pre-filter rows before calling Cortex so you only process records that need inference. Cache outputs in a table keyed on input hash to avoid re-processing duplicates. Set resource monitors. Test on LIMIT 100 before full-table runs. Monitor METERING_DAILY_HISTORY for weekly spend trends.
Q: When do you use Cortex ML functions vs Cortex LLM functions?
ML functions (FORECAST, ANOMALY_DETECTION, CLASSIFICATION, TOP_INSIGHTS): structured tabular data, predictive analytics, no prompts — Snowflake trains the model from your table. LLM functions: unstructured text — summarization, classification, generation, Q&A. Rule: tabular data with a clear target variable → ML functions. Free-form text to understand or generate → LLM functions.
Q: How does Cortex Fine-Tuning work and when is it justified?
Provide labeled instruction-response pairs in JSONL format in a stage. Call FINETUNE(base_model, training_data_url). Snowflake fine-tunes the model and deploys a private endpoint in your account. Justified when: base model quality is insufficient, you have 100+ high-quality examples, domain-specific style is critical. Not worth it for general summarization, translation, or standard NLP tasks.
Q: What is Cortex Analyst and how does it differ from Cortex Search?
Cortex Analyst: text-to-SQL for structured data. Backed by a semantic YAML model describing tables, measures, and dimensions. User asks in natural language, Analyst generates and executes SQL. Cortex Search: semantic document retrieval for unstructured text — indexes content, returns relevant chunks. Use Analyst for BI Q&A over structured tables. Use Search for document retrieval. Combine both for hybrid assistants.
Q: How do you evaluate LLM output quality at scale?
LLM-as-judge: run COMPLETE with a scoring prompt to rate each output row on relevance, faithfulness, completeness. Labeled eval set: compare against ground-truth answers using string or semantic similarity. For RAG: measure retrieval recall (did top-K include the answer?) and generation faithfulness (did the answer stay grounded in the context?). Store evaluation results in a table to track quality trends over time.
For governance-critical workloads — PII, regulated industries, financial data — Cortex is the strong choice because data never leaves Snowflake. Model quality is competitive for most tasks; GPT-4 class models are not available. The interview answer: use Cortex when data residency and governance are top priorities, external APIs when frontier model quality is non-negotiable, and route by data sensitivity when both are needed.
Cortex LLM functions are SQL-callable, fully managed, no deployment steps. Snowpark ML is for custom models — bring your own sklearn, XGBoost, or PyTorch model, register in the Model Registry, and serve custom inference. Use Cortex for out-of-the-box NLP and AutoML. Use Snowpark ML for custom models, specific preprocessing pipelines, or frameworks not supported natively by Cortex.
Production: use Cortex Search Service — it handles chunking, embedding, index maintenance, and retrieval automatically. For custom builds: chunk documents, call EMBED_TEXT_768, store in VECTOR column, query with VECTOR_COSINE_SIMILARITY. At millions of documents, Cortex Search is significantly more cost-effective and performant than managing the vector pipeline manually.
Access is controlled by the SNOWFLAKE.CORTEX_USER role. Existing masking and row access policies apply — the LLM sees only what the calling role is permitted to see. Network policies cover Cortex endpoints. All Cortex calls appear in QUERY_HISTORY for audit. Fine-tuning data stays within your Snowflake account and is never used to train Snowflake base models.