IntermediateLast updated: 2026-04-09 • 4 sections
Master the most frequently asked SQL interview questions with detailed answers. Covers joins, window functions, CTEs, query optimization, and real-world data scenarios.
Q: Explain the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN.
INNER JOIN returns only rows with matches in both tables. LEFT JOIN returns all rows from the left table plus matches from the right (NULLs where no match). RIGHT JOIN is the mirror of LEFT JOIN. FULL OUTER JOIN returns all rows from both tables, with NULLs where no match exists on either side. In practice, LEFT JOIN is used most often — RIGHT JOIN is rarely needed and can always be rewritten as a LEFT JOIN.
Q: What is a CROSS JOIN and when would you use it?
A CROSS JOIN produces the Cartesian product of two tables — every row paired with every row. It's rarely used in production but has valid use cases: generating date spines (CROSS JOIN a dates table with a dimensions table), creating all possible combinations for A/B test matrices, or expanding sparse data into dense grids. Always use with caution — a CROSS JOIN of two 10K-row tables produces 100M rows.
Q: How do you find rows in table A that don't exist in table B?
Three approaches: (1) LEFT JOIN + WHERE b.key IS NULL (most readable). (2) NOT EXISTS (SELECT 1 FROM b WHERE b.key = a.key) — often fastest with proper indexes. (3) NOT IN (SELECT key FROM b) — avoid this: it fails silently if b.key contains NULLs (returns no rows). Use LEFT JOIN or NOT EXISTS for correctness and performance.
Q: What is a self-join? Give a practical example.
A self-join joins a table to itself using different aliases. Example: finding employees and their managers from an employees table: SELECT e.name AS employee, m.name AS manager FROM employees e LEFT JOIN employees m ON e.manager_id = m.employee_id. Also used for: comparing consecutive events, finding duplicates, and hierarchical data traversal.
Q: What is the difference between GROUP BY and window functions?
GROUP BY collapses rows into summary rows (one row per group). Window functions compute values across a set of rows but preserve every individual row in the result. Example: GROUP BY department gives one salary-sum row per department. SUM(salary) OVER (PARTITION BY department) adds a department total column to every employee row. Use GROUP BY for aggregated reports, window functions when you need both detail and summary.
Q: Write a query to find the second-highest salary in each department.
SELECT department, employee, salary FROM (SELECT *, DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk FROM employees) ranked WHERE rnk = 2. Use DENSE_RANK (not ROW_NUMBER) to correctly handle ties — if two people share the highest salary, DENSE_RANK still assigns rank 2 to the next salary level. In Snowflake: SELECT * FROM employees QUALIFY DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) = 2.
Q: How do you calculate a running total in SQL?
SUM(amount) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW). The ORDER BY defines the sequence, and the frame clause defines the window (all rows from start to current). You can also use the shorthand SUM(amount) OVER (ORDER BY date) since the default frame is UNBOUNDED PRECEDING to CURRENT ROW. Add PARTITION BY to reset the running total per group.
Q: Explain the difference between RANK, DENSE_RANK, and ROW_NUMBER.
All three assign sequential numbers based on ORDER BY. ROW_NUMBER: always unique, arbitrary for ties (1,2,3,4). RANK: same number for ties, skips next (1,1,3,4). DENSE_RANK: same number for ties, no gaps (1,1,2,3). Use ROW_NUMBER for deduplication (pick one row per key). Use DENSE_RANK for "top N values" queries. Use RANK for competition-style ranking (3rd place after two 1st-place ties is 3rd, not 2nd).
Q: What is a CTE and how does it differ from a subquery?
A CTE (Common Table Expression) is defined with WITH ... AS and creates a named temporary result set readable like a table. Compared to subqueries: CTEs are more readable (especially when nested), can be referenced multiple times in the same query, and support recursion. Performance is usually identical — most databases inline CTEs like subqueries. Use CTEs for readability; use subqueries for simple one-off filters.
Q: What is a recursive CTE? Give an example.
A recursive CTE references itself to traverse hierarchical data. Structure: WITH RECURSIVE cte AS (base_case UNION ALL recursive_case). Example — organization hierarchy: WITH RECURSIVE org AS (SELECT id, name, manager_id, 1 AS level FROM employees WHERE manager_id IS NULL UNION ALL SELECT e.id, e.name, e.manager_id, o.level + 1 FROM employees e JOIN org o ON e.manager_id = o.id) SELECT * FROM org. Always include a termination condition to prevent infinite loops.
Q: How do you remove duplicate rows from a table?
Read-only (SELECT distinct rows): SELECT DISTINCT * FROM table, or use ROW_NUMBER() OVER (PARTITION BY duplicate_key ORDER BY tiebreaker) and keep rn = 1. Delete duplicates in-place: In Snowflake, use CREATE OR REPLACE TABLE t AS SELECT * FROM t QUALIFY ROW_NUMBER() OVER (PARTITION BY key ORDER BY updated_at DESC) = 1. In Postgres/MySQL, use DELETE with a CTE that identifies duplicate row IDs to remove.
Focus on: (1) Window functions (ROW_NUMBER, RANK, LAG/LEAD, running totals) — asked in nearly every interview. (2) Joins including self-joins and anti-joins. (3) CTEs and subquery optimization. (4) MERGE/upsert patterns. (5) Data deduplication techniques. (6) Date/time manipulation and gap filling.
Both are good for practice. LeetCode has more realistic data engineering problems. HackerRank has a better SQL-specific track. Also practice with real Snowflake trial accounts — interviewers value experience with actual data warehouse SQL dialects over generic SQL practice.