Snowflake Cortex AI & ML — Expert Interview Questions

AdvancedLast updated: 2026-04-27 • 4 sections

Expert questions on Cortex LLM functions, ML functions, embeddings, fine-tuning, and AI apps in Snowflake.

Key Facts for Cortex AI Interviews

  • COMPLETE, SUMMARIZE, TRANSLATE, SENTIMENT, EXTRACT_ANSWER — the five core Cortex LLM functions.
  • Core governance argument: data never leaves Snowflake, unlike calls to external OpenAI or Anthropic APIs.
  • EMBED_TEXT_768 creates vectors; VECTOR_COSINE_SIMILARITY retrieves top-K — full RAG pipeline in pure SQL.
  • Billing is per-token in Snowflake credits; monitor via METERING_DAILY_HISTORY.
  • Cortex Search Service handles chunking + embedding + index refresh automatically for production RAG.
  • ML functions (FORECAST, ANOMALY_DETECTION, CLASSIFICATION) are no-code AutoML — no model deployment needed.
  • Fine-tuning trains on your labeled JSONL data in a stage and deploys a private model endpoint in your account.

Cortex LLM Functions

Q: What are Cortex LLM functions and how do they differ from external LLM APIs?

Cortex functions (COMPLETE, SUMMARIZE, TRANSLATE, SENTIMENT) run Llama and Mistral models inside Snowflake. Data never leaves the account, they are called via SQL, billed per-token as credits, governed by existing RBAC and masking policies, and GPU scaling is fully managed. Key trade-off: no GPT-4 class model. For enterprise interviews, lead with governance — data residency outweighs model variety on most regulated workloads.

Q: How do you pass temperature, token limits, and system prompts to COMPLETE?

Simple form: COMPLETE(model, prompt). Advanced form: pass an options object — { messages: [{role: "system", content: "..."}, {role: "user", content: "..."}], temperature: 0.2, max_tokens: 512 }. Batch processing: SELECT COMPLETE(model, input_col) FROM table runs inference on every row. Always test on LIMIT 100 first — cost scales linearly with row count.

Q: How do you build a RAG pipeline entirely inside Snowflake?

Step 1: chunk documents and embed with EMBED_TEXT_768, store chunks in a table with a VECTOR(FLOAT, 768) column. Step 2: for each query, embed the question and retrieve top-K via VECTOR_COSINE_SIMILARITY. Step 3: pass retrieved chunks and the question to COMPLETE. For production, prefer Cortex Search Service — it manages chunking, embedding, and index maintenance automatically.

Q: What embedding models are available and how do you choose one?

e5-base-v2: general purpose, lower latency. multilingual-e5-large: 100+ language coverage. snowflake-arctic-embed: highest English retrieval quality. Choose by language requirements, retrieval quality vs latency budget, and storage (768-dim vs 1024-dim). Store vectors in a VECTOR(FLOAT, 768) or VECTOR(FLOAT, 1024) column type.

Q: How do you control Cortex LLM costs in production?

Use the smallest model that meets the quality bar. Set max_tokens in COMPLETE options. Pre-filter rows before calling Cortex so you only process records that need inference. Cache outputs in a table keyed on input hash to avoid re-processing duplicates. Set resource monitors. Test on LIMIT 100 before full-table runs. Monitor METERING_DAILY_HISTORY for weekly spend trends.

ML Functions, Fine-Tuning, and Cortex Analyst

Q: When do you use Cortex ML functions vs Cortex LLM functions?

ML functions (FORECAST, ANOMALY_DETECTION, CLASSIFICATION, TOP_INSIGHTS): structured tabular data, predictive analytics, no prompts — Snowflake trains the model from your table. LLM functions: unstructured text — summarization, classification, generation, Q&A. Rule: tabular data with a clear target variable → ML functions. Free-form text to understand or generate → LLM functions.

Q: How does Cortex Fine-Tuning work and when is it justified?

Provide labeled instruction-response pairs in JSONL format in a stage. Call FINETUNE(base_model, training_data_url). Snowflake fine-tunes the model and deploys a private endpoint in your account. Justified when: base model quality is insufficient, you have 100+ high-quality examples, domain-specific style is critical. Not worth it for general summarization, translation, or standard NLP tasks.

Q: What is Cortex Analyst and how does it differ from Cortex Search?

Cortex Analyst: text-to-SQL for structured data. Backed by a semantic YAML model describing tables, measures, and dimensions. User asks in natural language, Analyst generates and executes SQL. Cortex Search: semantic document retrieval for unstructured text — indexes content, returns relevant chunks. Use Analyst for BI Q&A over structured tables. Use Search for document retrieval. Combine both for hybrid assistants.

Q: How do you evaluate LLM output quality at scale?

LLM-as-judge: run COMPLETE with a scoring prompt to rate each output row on relevance, faithfulness, completeness. Labeled eval set: compare against ground-truth answers using string or semantic similarity. For RAG: measure retrieval recall (did top-K include the answer?) and generation faithfulness (did the answer stay grounded in the context?). Store evaluation results in a table to track quality trends over time.

Cortex AI Production Readiness Checklist

Frequently Asked Questions

Can Cortex replace OpenAI for enterprise AI workloads?

For governance-critical workloads — PII, regulated industries, financial data — Cortex is the strong choice because data never leaves Snowflake. Model quality is competitive for most tasks; GPT-4 class models are not available. The interview answer: use Cortex when data residency and governance are top priorities, external APIs when frontier model quality is non-negotiable, and route by data sensitivity when both are needed.

What is the difference between Cortex LLM functions and Snowpark ML?

Cortex LLM functions are SQL-callable, fully managed, no deployment steps. Snowpark ML is for custom models — bring your own sklearn, XGBoost, or PyTorch model, register in the Model Registry, and serve custom inference. Use Cortex for out-of-the-box NLP and AutoML. Use Snowpark ML for custom models, specific preprocessing pipelines, or frameworks not supported natively by Cortex.

How do you implement semantic search over millions of documents in Snowflake?

Production: use Cortex Search Service — it handles chunking, embedding, index maintenance, and retrieval automatically. For custom builds: chunk documents, call EMBED_TEXT_768, store in VECTOR column, query with VECTOR_COSINE_SIMILARITY. At millions of documents, Cortex Search is significantly more cost-effective and performant than managing the vector pipeline manually.

What security and governance controls apply to Cortex AI calls?

Access is controlled by the SNOWFLAKE.CORTEX_USER role. Existing masking and row access policies apply — the LLM sees only what the calling role is permitted to see. Network policies cover Cortex endpoints. All Cortex calls appear in QUERY_HISTORY for audit. Fine-tuning data stays within your Snowflake account and is never used to train Snowflake base models.

Related Cheat Sheets

Top 30 Snowflake Interview Questions & AnswersSnowflake Query Tuning — Expert Interview QuestionsSnowflake Stored Procedures & UDFs — Expert Interview Questions
← All Cheat Sheets