Snowflake External Functions & Integrations — Expert Interview Questions

AdvancedLast updated: 2026-04-27 • 4 sections

Expert questions on external functions, API integrations, external tables, external stages, and connecting Snowflake to external systems.

Key Facts for External Integrations Interviews

  • External functions call REST APIs during query execution — data LEAVES Snowflake with every call.
  • API integration object establishes trust between Snowflake and an API Gateway (AWS, Azure, GCP).
  • Snowflake batches rows (up to 1000 per request) and expects the API to return rows in the same batch format.
  • External tables: read-only virtual tables over files in S3/GCS/Azure; no data is copied into Snowflake.
  • External stage: pointer to a cloud storage location; used for COPY INTO, Snowpipe, and PUT/GET.
  • Storage integration object: preferred over access keys for stage authentication (IAM role trust policy).
  • Notification integration: connects Snowflake to SQS/SNS/Azure Event Grid for auto-ingest triggers.

External Functions and API Integrations

Q: What are external functions and when should you use them?

External functions let Snowflake call external HTTP endpoints during query execution via API Gateway (AWS API Gateway, Azure API Management). SQL calls the function, Snowflake batches rows and sends them to the endpoint, the API processes and returns results. Use cases: ML model inference on an external service, geocoding/IP lookup enrichment APIs, sending notifications from SQL, calling proprietary services that cannot run inside Snowflake. Key caveat: data leaves Snowflake — governance, latency, and cost must be considered for every call.

Q: Walk through setting up an external function end-to-end.

Step 1: create an API integration: CREATE API INTEGRATION ext_api API_PROVIDER=aws_api_gateway API_AWS_ROLE_ARN=arn:... API_ALLOWED_PREFIXES=(https://...). Step 2: retrieve the Snowflake IAM principal and update your API Gateway resource policy to trust it. Step 3: create the function: CREATE EXTERNAL FUNCTION classify(text VARCHAR) RETURNS VARIANT API_INTEGRATION=ext_api AS 'https://api-id.execute-api.region.amazonaws.com/stage/classify'. Step 4: call it: SELECT classify(description) FROM products. Snowflake sends batches of 1000 rows as JSON, expects JSON rows back.

Q: What are the performance and cost considerations for external functions?

Latency: 100–500ms per API call (per batch of 1000 rows). For 1M rows = 1000 batches = 100–500 seconds of network time minimum. Cost: Snowflake compute runs while waiting for API responses; external API compute/invocation costs apply. Mitigation: batch as large as possible (up to 1000 rows), cache results in a Snowflake table keyed on input hash, consider running bulk offline via Snowpark Python + external SDK instead of per-row SQL external function calls for large volumes.

Q: What is the security model for external functions?

API integration uses IAM role trust policy (AWS) or Managed Identity (Azure) — no credentials stored in Snowflake. Snowflake generates a unique IAM principal per API integration; the API Gateway resource policy trusts this specific principal. External function access is controlled by GRANT USAGE ON INTEGRATION and GRANT USAGE ON FUNCTION to roles. Network traffic goes over HTTPS. Data-in-transit is encrypted but the API endpoint owner can inspect the data — use for non-sensitive data or ensure the API provider meets your compliance requirements.

External Tables, Stages, and Notification Integrations

Q: What are external tables and when should you use them over COPY INTO?

External tables are read-only virtual tables pointing to files in a cloud storage location. Data is NOT copied into Snowflake — it is queried in place. Use when: data must stay in the external storage (compliance, data ownership), files are large and rarely queried (cost to copy outweighs query savings), or you need Snowflake SQL access to a data lake without full ingestion. Downside: slower than native tables (no micro-partition optimization), no caching, limited DML. Use COPY INTO when you query the data frequently — native tables are significantly faster.

Q: How do storage integrations differ from using access keys for external stages?

Access keys (AWS_KEY_ID + AWS_SECRET_KEY): stored in Snowflake metadata, need rotation, risk of credential leak. Storage integration: uses an IAM role trust policy — Snowflake's IAM principal assumes a role in your AWS account. No credentials stored, no rotation needed, follows AWS least-privilege IAM best practices. Storage integrations are the recommended approach for all production external stages. Access keys are only appropriate for quick testing or environments where IAM role trust is not available.

Q: How does Snowflake auto-ingest work with S3 + SQS notifications?

Setup: (1) Create a notification integration pointing to an SQS queue. (2) Configure the S3 bucket to send ObjectCreated events to the SQS queue. (3) Create a Snowpipe with AUTO_INGEST=TRUE referencing the stage and notification integration. When a new file lands in S3, an SQS message triggers Snowpipe automatically — no polling needed. Snowpipe loads the file via COPY INTO the target table within 30–120 seconds. Monitor via SNOWFLAKE.ACCOUNT_USAGE.PIPE_USAGE_HISTORY.

Q: How do you query external tables efficiently and avoid full file scans?

Partition pruning: define a partition column expression in the external table DDL (e.g., PARTITION BY (TO_DATE(SUBSTR(metadata$filename, ...)))). Snowflake uses the partition expression to skip irrelevant partitions on WHERE clauses. Without partitioning, every query scans all files. Additional tips: use columnar formats (Parquet over CSV), use small, consistently-sized files, co-locate files geographically (same region as Snowflake account), cluster file naming by partition value for predictable pruning.

External Integrations Interview Prep Checklist

Frequently Asked Questions

When should you use an external function vs a Python UDF for calling an external service?

External function: SQL-native invocation, no code deployment in Snowflake, best for existing REST APIs behind API Gateway. Python UDF: more flexible, can use the full Python ecosystem, vectorized batching with pandas, no API Gateway required. For ML inference: if your model is on an external endpoint (SageMaker, Azure ML), use external function. If you can package the model as a Python UDF with Snowflake Anaconda packages, Python UDF is lower latency and simpler. Python UDF keeps data inside Snowflake; external function does not.

What happens to an external table query if the underlying files are modified or deleted?

If a file referenced by an external table is deleted, queries that scan that partition will fail with a file-not-found error. If a file is modified in place (overwritten), the external table may return stale or inconsistent data because Snowflake caches file metadata. Best practice: never modify files in place — use append-only storage patterns (new file per batch). Run ALTER EXTERNAL TABLE REFRESH to sync Snowflake metadata with the current storage state after adding or removing files.

How do you secure access to an external stage that stores sensitive data?

Use a storage integration with a least-privilege IAM role (read-only for ingest, no LIST on unrelated prefixes). Scope the IAM policy to the specific S3 prefix, not the full bucket. Encrypt files at rest with SSE-S3 or SSE-KMS (Snowflake handles decryption via the IAM role). In Snowflake: grant USAGE on the stage only to roles that need it — COPY INTO or Snowpipe access grants are separate from stage grants. Audit stage access via ACCESS_HISTORY and COPY_HISTORY. Rotate IAM role trust policy if the Snowflake account is migrated.

Related Cheat Sheets

Top 30 Snowflake Interview Questions & AnswersSnowpipe Streaming & Kafka — Expert Interview QuestionsSnowflake Stored Procedures & UDFs — Expert Interview Questions
← All Cheat Sheets