Global Trend Radar
Dev.to US tech 2026-06-26 23:29

SQLウィンドウ関数:中堅開発者とシニア開発者の違い

原題: SQL window functions: what separates a mid-level from a senior developer

元記事を開く →

分析結果

カテゴリ
AI
重要度
65
トレンドスコア
27
要約
SQLウィンドウ関数は、データ分析や集計において強力なツールです。中堅開発者は基本的なウィンドウ関数を理解し、簡単なクエリを作成できますが、シニア開発者は複雑なデータセットを扱い、パフォーマンスを最適化するための高度なテクニックを駆使します。また、シニア開発者はビジネスのニーズを理解し、適切な分析手法を選択する能力も求められます。
キーワード
Most developers know SELECT, JOIN, and GROUP BY. Window functions are where the gap between mid and senior starts to show. What makes window functions different GROUP BY collapses rows. You lose the individual record — it becomes part of an aggregate. Window functions don't collapse anything. They compute an aggregate over a set of rows while keeping every row intact. That's the core idea. -- GROUP BY: one row per department SELECT department , AVG ( salary ) AS avg_salary FROM employees GROUP BY department ; -- Window function: every employee row, plus their department average SELECT name , department , salary , AVG ( salary ) OVER ( PARTITION BY department ) AS dept_avg FROM employees ; Same average calculation. Completely different result shape. Window functions let you ask "how does this row relate to the group?" — something GROUP BY simply can't do. The anatomy of a window function function_name () OVER ( PARTITION BY col1 , col2 -- defines the "group" (optional) ORDER BY col3 -- defines row order within the group ROWS BETWEEN ... -- defines the frame (optional) ) PARTITION BY — splits rows into independent windows. Omit it and the entire result set is one window. ORDER BY — required for ranking and frame-sensitive functions. Changes the meaning of aggregates. frame clause — defines which rows are "visible" to the function. Default varies by function. Ranking functions ROW_NUMBER() Assigns a unique sequential number to each row within the partition. No ties — if two rows are equal, the order is arbitrary but distinct. SELECT name , department , salary , ROW_NUMBER () OVER ( PARTITION BY department ORDER BY salary DESC ) AS rn FROM employees ; Classic use case: get the top N rows per group without a subquery per group. -- Top earner per department SELECT * FROM ( SELECT name , department , salary , ROW_NUMBER () OVER ( PARTITION BY department ORDER BY salary DESC ) AS rn FROM employees ) ranked WHERE rn = 1 ; RANK() vs DENSE_RANK() Both handle ties, but differently: RANK() — tied rows get the same rank, next rank skips. (1, 2, 2, 4) DENSE_RANK() — tied rows get the same rank, next rank does not skip. (1, 2, 2, 3) SELECT name , salary , RANK () OVER ( ORDER BY salary DESC ) AS rank , DENSE_RANK () OVER ( ORDER BY salary DESC ) AS dense_rank FROM employees ; Use DENSE_RANK() when you want "2nd place" to actually mean second, not "tied for second, third place vacant." Offset functions: LAG and LEAD These let you access another row's value from the current row — without a self-join. LAG ( column , offset , default ) OVER ( PARTITION BY ... ORDER BY ...) LEAD ( column , offset , default ) OVER ( PARTITION BY ... ORDER BY ...) -- Month-over-month revenue change SELECT month , revenue , LAG ( revenue , 1 , 0 ) OVER ( ORDER BY month ) AS prev_month_revenue , revenue - LAG ( revenue , 1 , 0 ) OVER ( ORDER BY month ) AS delta FROM monthly_revenue ; -- Time between consecutive orders per user SELECT user_id , order_date , LEAD ( order_date ) OVER ( PARTITION BY user_id ORDER BY order_date ) AS next_order , LEAD ( order_date ) OVER ( PARTITION BY user_id ORDER BY order_date ) - order_date AS days_until_next FROM orders ; LAG / LEAD replace a category of self-joins that are painful to read and slow to execute. If you see a query joining a table to itself on id - 1 or date arithmetic, it's almost always a LAG in disguise. Aggregate window functions Any aggregate ( SUM , AVG , COUNT , MIN , MAX ) can be used as a window function. This unlocks running totals and moving averages without cursors or subqueries. Running total SELECT order_date , amount , SUM ( amount ) OVER ( ORDER BY order_date ) AS running_total FROM orders ; Running total reset per partition SELECT user_id , order_date , amount , SUM ( amount ) OVER ( PARTITION BY user_id ORDER BY order_date ) AS user_running_total FROM orders ; Moving average (last 7 days) This is where the frame clause matters: SELECT date , daily_revenue , AVG ( daily_revenue ) OVER ( ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ) AS moving_avg_7d FROM daily_revenue ; ROWS BETWEEN 6 PRECEDING AND CURRENT ROW — the window includes the current row and the 6 rows before it. Classic 7-day rolling average. ROWS vs RANGE ROWS — physical rows. ROWS BETWEEN 6 PRECEDING AND CURRENT ROW means exactly 6 rows back. RANGE — logical range based on the ORDER BY value. RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW means all rows within 6 days, regardless of row count. For time series with gaps, RANGE is usually what you actually want. -- ROWS: always exactly 6 rows back (may span more than 6 days if data is sparse) AVG ( revenue ) OVER ( ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ) -- RANGE: all rows within the last 6 days (correct for sparse data) AVG ( revenue ) OVER ( ORDER BY date RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW ) NTILE() : bucketing rows Divides rows into N equal-sized buckets. Useful for percentiles, quartiles, and cohort analysis. -- Divide customers into 4 quartiles by total spend SELECT customer_id , total_spend , NTILE ( 4 ) OVER ( ORDER BY total_spend DESC ) AS quartile FROM customer_spend ; Quartile 1 = top spenders. Combine with a CTE to filter or label each group. FIRST_VALUE() and LAST_VALUE() Return the first or last value in the window frame. SELECT name , department , salary , FIRST_VALUE ( name ) OVER ( PARTITION BY department ORDER BY salary DESC ) AS highest_paid_in_dept FROM employees ; LAST_VALUE() has a trap: the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW , so "last" means the current row, not the end of the partition. Fix it explicitly: LAST_VALUE ( name ) OVER ( PARTITION BY department ORDER BY salary DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) Reading execution plans Window functions aren't free. Know what to look for in EXPLAIN ANALYZE . EXPLAIN ANALYZE SELECT user_id , order_date , SUM ( amount ) OVER ( PARTITION BY user_id ORDER BY order_date ) FROM orders ; Look for: WindowAgg — the window function node. Normal. Sort — a sort before WindowAgg . May be expensive on large tables. Seq Scan vs Index Scan — if your PARTITION BY or ORDER BY columns aren't indexed, Postgres will sort the full table. When to add an index If the same PARTITION BY / ORDER BY combination appears in multiple queries: CREATE INDEX idx_orders_user_date ON orders ( user_id , order_date ); This lets Postgres avoid a full sort and use the index order directly. When NOT to use window functions Small datasets where a simple subquery is more readable When you need to filter on the window result in the same query level — you can't WHERE rank = 1 directly; you need a CTE or subquery When the frame logic gets complex enough that a CTE makes the intent clearer Practical patterns worth bookmarking Deduplicate keeping latest record WITH ranked AS ( SELECT * , ROW_NUMBER () OVER ( PARTITION BY user_id ORDER BY created_at DESC ) AS rn FROM events ) SELECT * FROM ranked WHERE rn = 1 ; Percent of total SELECT department , SUM ( salary ) AS dept_total , SUM ( salary ) * 100 . 0 / SUM ( SUM ( salary )) OVER () AS pct_of_company FROM employees GROUP BY department ; Note: SUM(SUM(salary)) OVER () — a window function over an aggregate. This is valid SQL and very useful. Gap detection between dates SELECT user_id , order_date , LAG ( order_date ) OVER ( PARTITION BY user_id ORDER BY order_date ) AS prev_order , order_date - LAG ( order_date ) OVER ( PARTITION BY user_id ORDER BY order_date ) AS gap_days FROM orders WHERE gap_days > 30 ; -- wrap in CTE to filter Summary Function Use case ROW_NUMBER() Unique rank, top-N per group, deduplication RANK() / DENSE_RANK() Ranking with ties LAG() / LEAD() Compare current row to previous/next SUM() OVER Running totals AVG() OVER + frame Moving averages NTILE() Percentiles, quartiles, bucketing FIRST_VALUE() / LAST_VALUE() Boundary values within a partition Window functions don't replace every query pattern — but they replace the ugly ones. Self-joins, correlated subqueries, and cursor-based loops are often a window function with cleaner syntax and a better execution plan. Learn the frame clause. It's the thing most tutorials skip, and it's where the real power is. What's your most-used window function pattern in production? Drop it in the comments. Tags: sql postgres backend database Most developers know SELECT, JOIN, and GROUP BY. Window functions are where the gap between mid and senior starts to show. What makes window functions different GROUP BY collapses rows. You lose the individual record — it becomes part of an aggregate. Window functions don't collapse anything. They compute an aggregate over a set of rows while keeping every row intact. That's the core idea. -- GROUP BY: one row per department SELECT department , AVG ( salary ) AS avg_salary FROM employees GROUP BY department ; -- Window function: every employee row, plus their department average SELECT name , department , salary , AVG ( salary ) OVER ( PARTITION BY department ) AS dept_avg FROM employees ; Same average calculation. Completely different result shape. Window functions let you ask "how does this row relate to the group?" — something GROUP BY simply can't do. The anatomy of a window function function_name () OVER ( PARTITION BY col1 , col2 -- defines the "group" (optional) ORDER BY col3 -- defines row order within the group ROWS BETWEEN ... -- defines the frame (optional) ) PARTITION BY — splits rows into independent windows. Omit it and the entire result set is one window. ORDER BY — required for ranking and frame-sensitive functions. Changes the meaning of aggregates. frame clause — defines which rows are "visible" to the function. Default varies by function. Ranking functions ROW_NUMBER() Assigns a unique sequential number to each row within the partition. No ties — if two rows are equal, the order is arbitrary but distinct. SELECT name , department , salary , ROW_NUMBER () OVER ( PARTITION BY departmen