AdvancedLast updated: 2026-04-09 • 3 sections
Expert questions on external functions, API integrations, external tables, external stages, and connecting Snowflake to external systems.
Q: What are external functions and when do you use them?
External functions let Snowflake call external HTTP endpoints (REST APIs) during query execution. SQL calls the function, Snowflake sends rows to an API Gateway (AWS API Gateway, Azure API Management), which routes to your backend (Lambda, Azure Functions, etc.). Use cases: (1) ML model inference on an external service. (2) Geocoding, IP lookup, or enrichment APIs. (3) Sending notifications from SQL (email, Slack). (4) Calling proprietary services that cannot run inside Snowflake. Key: data LEAVES Snowflake during the call, so governance and latency are concerns.
Q: Walk through setting up an external function end to end.
Steps: (1) Create API integration: CREATE API INTEGRATION ext_api API_PROVIDER=aws_api_gateway API_AWS_ROLE_ARN=... API_ALLOWED_PREFIXES=(url). This establishes trust between Snowflake and your API Gateway. (2) Create the function: CREATE EXTERNAL FUNCTION classify(text VARCHAR) RETURNS VARIANT API_INTEGRATION=ext_api AS url. (3) Call it: SELECT classify(description) FROM products. Under the hood: Snowflake batches rows (up to 1000 per request), sends JSON payload to the endpoint, receives JSON response, maps back to rows. The API must handle the Snowflake batch format (rows array in, rows array out).
Q: Performance and cost considerations for external functions?
Performance: (1) Network latency per batch (100-500ms per API call). (2) Snowflake sends rows in batches of up to 1000, processes concurrently based on warehouse size. (3) Large tables (millions of rows) = thousands of API calls. Throughput depends on your backend capacity. Cost: (1) Snowflake warehouse credits while waiting for API responses. (2) API Gateway / Lambda / function costs. (3) Data transfer (egress from Snowflake). Optimization: (1) Pre-filter data to minimize rows sent. (2) Cache results in a table for repeated lookups. (3) Use larger warehouse for more parallel batches. (4) Ensure your backend can handle the load (auto-scaling Lambda).
Q: External tables vs regular tables: when to use external tables?
External tables reference data in cloud storage (S3/GCS/ADLS) without copying it into Snowflake. Data stays in original format (Parquet, CSV, JSON). Use when: (1) Data is managed by another system (Spark, data lake). (2) You want to query without ingestion delay/cost. (3) Data governance requires storage in your own bucket. (4) Exploratory queries on large datasets before deciding to ingest. Trade-offs: external tables are slower (no Snowflake optimizations like clustering, micro-partition pruning is limited), no DML (read-only), and limited metadata. For production workloads, ingest into native or Iceberg tables.
Q: How do you manage external stages and file formats?
External stage: CREATE STAGE my_stage URL=s3://bucket/path/ STORAGE_INTEGRATION=my_int FILE_FORMAT=(TYPE=PARQUET). Storage integration: abstraction over cloud credentials (avoids embedding keys in stage definition). File format: reusable format spec (TYPE, COMPRESSION, FIELD_DELIMITER, etc.). Pattern: (1) Create storage integration (once, by admin). (2) Create stage referencing integration. (3) Create file format. (4) COPY INTO table FROM @stage or create external table on stage. Best practice: use storage integrations, never embed credentials in stage URLs.
The API must be exposed through a supported API Gateway (AWS API Gateway, Azure API Management, Google Cloud API Gateway). The endpoint must handle Snowflake batch format. Any backend behind the gateway works: Lambda, containers, VMs, third-party APIs (with a proxy).
External function calls have a default timeout of 180 seconds per batch. For long-running inference, consider async patterns: the function returns immediately with a job ID, a separate task polls for results.