Window function invocation—SQL syntax and semantics
This page documents the preview version (v2.23). Preview includes features under active development and is for development and testing only. For production, use the stable version (v2024.1). To learn more, see Versioning.
The rules described in this section also govern the invocation of aggregate functions.
The dedicated Aggregate functions section explains that one kind of aggregate function—so-called ordinary aggregate functions, exemplified byavg() and count()—can optionally be invoked using the identical syntax that you use to invoke window functions. That dedicated section has many examples. See also the sections Using the aggregate function avg() to compute a moving average and Using the aggregate function sum() with the OVER clause in the present Window functions main section.
A note on orthography
Notice these three different orthography styles:
-
OVERis a keyword that names a clause. You write such a keyword in a SQL statement. -
window_definitionis the name of a rule within the overall SQL grammar. You never type such a name in a SQL statement. It is written in bold lower case with underscores, as appropriate, between the English words. Because such a rule is always shown as a link, you can jump directly to the rule in the Grammar Diagrams page. This page shows every single one of the SQL rules. It so happens that thewindow_definitionrule starts with the keywordWINDOWand might therefore, according to the context of use, be referred to alternatively as theWINDOWclause. -
window frame is a pure term of art. It is written in italic lower case with spaces, as appropriate, between the English words. You neither write it in a SQL statement nor use it to look up anything in the Grammar Diagrams page. Because such a term of art is always shown as a link, you can jump directly to its definition within this "Window function invocation—SQL syntax and semantics" page.
Syntax
Reproduced from the SELECT statement section
The following three diagrams, select_start, WINDOW clause, and fn_over_window rule, are reproduced from the section that describes the SELECT statement.
select_start ::= SELECT [ ALL |
DISTINCT [ ON { ( expression [ , ... ] ) } ] ]
[ * | { { expression
| fn_over_window
| ordinary_aggregate_fn_invocation
| within_group_aggregate_fn_invocation }
[ [ AS ] name ] } [ , ... ] ]
window_clause ::= WINDOW { { name AS window_definition } [ , ... ] }
fn_over_window ::= name ( [ expression [ , ... ] | * ]
[ FILTER ( WHERE boolean_expression ) ] OVER
{ window_definition | name }
Definition of the window_definition rule
As promised in the SELECT statement section, this section explains the window_definition rule and its use as the argument of either the OVER keyword or the WINDOW keyword.
A window_definition can be used only at these two syntax spots, within the enclosing syntax of a subquery.
window_definition ::= ( [ name ]
[ PARTITION BY order_expr [ , ... ] ]
[ ORDER BY order_expr [ , ... ] ]
[ frame_clause ] )
The frame_clause
frame_clause ::= [ { RANGE | ROWS | GROUPS } frame_bounds ]
[ frame_exclusion ]
frame_bounds ::= frame_start | BETWEEN frame_start AND frame_end
frame_start ::= frame_bound
frame_end ::= frame_bound
frame_bound ::= UNBOUNDED PRECEDING
| offset PRECEDING
| CURRENT ROW
| offset FOLLOWING
| UNBOUNDED FOLLOWING
frame_exclusion ::= EXCLUDE CURRENT ROW
| EXCLUDE GROUP
| EXCLUDE TIES
| EXCLUDE NO OTHERS
Semantics
The fn_over_window rule
A window function can be invoked only at the syntax spot in a subquery that the diagram for the select_start rule shows. An aggregate function may be invoked in this way as an alternative to its more familiar invocation as a regular SELECT list item in conjunction with the GROUP BY clause. (The invocation of an aggregate function in conjunction with the GROUP BY clause is governed by the ordinary_aggregate_fn_invocation rule or the within_group_aggregate_fn_invocation rule.)
The number, data types, and meanings of a window function's formal parameters are function-specific. The eleven window functions are classified into functional groups, and summarized, in the two tables at the end of the section Signature and purpose of each window function. Each entry links to the formal account of the function which also provides runnable code examples.
Notice that, among the dedicated window functions (as opposed to aggregate functions that may be invoked as window functions), only ntile() takes an argument. Every other dedicated window function is invoked with an empty parentheses pair. Some aggregate functions (like, for example, jsonb_object_agg()) take more than one argument. When an aggregate function is invoke as a window function, the keyword DISTINCT is not allowed within the parenthesized list of arguments. The attempt causes this error:
0A000: DISTINCT is not implemented for window functions
The window_definition rule
The syntax diagram for the window_definition shows that it uses three complementary specifications:
- The
PARTITION BYclause defines the maximal subsets, of what the subquery-levelWHEREclause defines, that are operated upon, in turn, by a window function (or by an aggregate function in window mode). Tautologically, this maximal subset is referred to as the window. In the limit, when thePARTITION BYclause is omitted, the maximal subset is identical with what theWHEREclause defines. - The window
ORDER BYclause defines how the rows are to be ordered within the window. - The
frame_clausedefines a further narrowing of the window, referred to as the window frame. The window frame is anchored to the current row within the window. In the degenerate case, the window frame coincides with the window and is therefore insensitive to the position of the current row.
In summary, the window_definition defines the window as the scope within which a function's meaning (window function or aggregate function in window mode) is defined. The window is then further characterized by the ordering of its rows, the extent of the window frame, and how this moves with the current row.
The FILTER clause
The FILTER clause's WHERE clause has the same syntax and semantics as it does at the regular WHERE clause syntax spot immediately after a subquery's FROM list. Notice that the FILTER clause is legal only for the invocation of an aggregate function. Here is an example:
select
class,
k,
count(*)
filter(where k%2 = 0)
over (partition by class)
as n
from t1
order by class, k;
If you want to run this, then create a data set using the ysqlsh script that table t1 presents.
Using the FILTER clause in the invocation of a window function causes this compilation error:
0A000: FILTER is not implemented for non-aggregate window functions
The PARTITION BY clause
The PARTITION BY clause groups the rows that the subquery defines into windows, which are processed separately by the window function. (This holds, too, when an aggregate function is invoked in this way.) It works similarly to a query-level GROUP BY clause, except that its expressions are always just expressions and cannot be output-column names or numbers. If the PARTITION BY clause is omitted, then all rows are treated as a single window.
The window ORDER BY clause
The window ORDER BY clause determines the order in which the rows of a window are processed by the window function. It works similarly to a query-level ORDER BY clause; but it cannot use output-column names or numbers. If the window ORDER BY clause is omitted, then rows are processed in an unspecified order so that the results of any window function invoked in this way would be unpredictable and therefore meaningless. Aggregation functions invoked in this way might be sensitive to what the window ORDER BY clause says. This will be the case when, for example, the window frame is smaller than the whole window and moves with the current row. The section Using the aggregate function avg() to compute a moving average provides an example.
The frame_clause
The frame_clause has many variants. Only one basic variant is needed in the OVER clause that you use to invoke a window function. The other variants are useful in the OVER clause that you use to invoke an aggregate function. For completeness, those variants are described on this page.
frame_clause semantics for window functions
The frame_clause specifies the set of rows constituting the so-called window frame. In general, this will be a subset of the rows in the current window. Look at the two tables at the end of the section Signature and purpose of each window function.
-
The functions in the first group, Window functions that return an "int" or "double precision" value as a "classifier" of the rank of the row within its window, are not sensitive to what the
frame_clausespecifies and always use all of the rows in the current window. Yugabyte recommends that you therefore omit theframe_clausein theOVERclause that you use to invoke these functions. -
The functions in the second group, Window functions that return columns of another row within the window, make obvious sense when the scope within which the specified row is found is the entire window. If you have one of the very rare use cases where the output that you want is produced by a different
frame_clause, then specify what you want explicitly. Otherwise, because it isn't the default, you must specify that the window frame includes the entire current window like this:range between unbounded preceding and unbounded following
Use cases where the frame_clause's many other variants are useful arise when an aggregate function is invoked using the OVER clause. One example is given in the section Using the aggregate function avg() to compute a moving average. Another example, that uses count(*), is given in the code that explains the meaning of the percent_rank() function. Otherwise, see the main Aggregate functions section.
frame_clause semantics for aggregate functions
The window frame can be specified in RANGE, ROWS or GROUPS mode; in each case, it runs from the frame_start to the frame_end. If frame_end is omitted, then the end defaults to CURRENT ROW.
A frame_start of UNBOUNDED PRECEDING means that the window frame starts with the first row of the window. Similarly, a frame_end of UNBOUNDED FOLLOWING means that the window frame ends with the last row of the window.
In RANGE or GROUPS mode, a frame_start of CURRENT ROW means that the window frame starts with the first member of the current row's peer group. A peer group is a set of rows that the window ORDER BY clause sorts with the same rank as the current row. And a frame_end of CURRENT ROW means that the window frame ends with the last row in the current row's peer group. In ROWS mode, CURRENT ROW simply means the current row.
For the offset PRECEDING and offset FOLLOWING modes of the frame_start and frame_end clauses, the offset argument must be an expression that doesn't include any variables, aggregate functions, or window functions. The meaning of the offset value depends on the RANGE | ROWS | GROUPS mode:
-
In
ROWSmode, theoffsetvalue must be aNOT NULL, non-negative integer. This brings the meaning that the window frame starts or ends the specified number of rows before or after the current row. -
In
GROUPSmode, theoffsetvalue must again be aNOT NULL, non-negative integer. Here, this brings the meaning that the window frame starts or ends the specified number of peer groups before or after the current row's peer group. Recall that there's always a logical requirement to include a windowORDER BYclause in the window definition that is used to invoke a window function. InGROUPSmode, whatever is your intended use of the window definition, you get this error if it doesn't include a windowORDER BYclause:42P20: GROUPS mode requires an ORDER BY clause -
In
RANGEmode, these options require that the windowORDER BYclause specify exactly one column. Theoffsetvalue specifies the maximum difference between the value of that column in the current row and its value in the preceding or following rows of the window frame. Theoffsetexpression must yield a value whose data type depends upon that of the ordering column. For numeric ordering columns (likeint,double precision, and so on), it is typically of the same data type as the ordering column; but for date-time ordering columns it is aninterval. For example, if the ordering column isdateortimestamp, you could specifyRANGE BETWEEN '1 day' PRECEDING AND '10 days' FOLLOWING. Here too, theoffsetvalue must beNOT NULLand non-negative. The meaning of “non-negative” depends on the data type.
In all cases, the distance to the start and end of the window frame is limited by the distance to the start and end of the window, so that for rows near the window boundaries, the window frame might contain fewer rows than elsewhere.
Notice that in both ROWS and GROUPS mode, 0 PRECEDING and 0 FOLLOWING is equivalent to CURRENT ROW. This normally holds in RANGE mode too, for an appropriate meaning of “zero” specific to the data type.
The frame_exclusion clause allows rows around the current row to be excluded from the window frame, even if they would be included according to what the frame_start and frame_end clauses say.
EXCLUDE CURRENT ROWexcludes the current row from the window frame.EXCLUDE GROUPexcludes all the rows in the current row's peer group.EXCLUDE TIESexcludes any peers of the current row, but not the current row itself.EXCLUDE NO OTHERSsimply specifies explicitly the default behavior of not excluding the current row or its peers.
Omitting the frame_clause is the same as specifying
RANGE UNBOUNDED PRECEDING
and this means the same as
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
If the window ORDER BY clause is specified, then this default meaning sets the window frame to be all rows from the window start up through the last row in the current row's peer group. And if the window ORDER BY clause is omitted this means that all rows of the window are included in the window frame, because all rows become peers of the current row.
Notes:
- The
frame_startclause cannot beUNBOUNDED FOLLOWING - The
frame_endclause cannot beUNBOUNDED PRECEDING, and cannot appear before theframe_startclause.
For example RANGE BETWEEN CURRENT ROW AND offset PRECEDING causes this error:
42P20: frame starting from current row cannot have preceding rows
However, ROWS BETWEEN 7 PRECEDING AND 8 PRECEDING is allowed, even though it would never select any rows.
If the FILTER clause is specified, then only the input rows for which it evaluates to true are fed to the window function; other rows are discarded. As noted above, only window aggregate functions invoked using the OVER clause accept a FILTER clause.
Examples
First example
This shows the use of a window function with an OVER clause that directly specifies the window_definition :
select
...
some_window_function(...) over (partition by <column list 1> order by <column list 2>) as a1,
...
from ...
Notice that the syntax spot occupied by "some_window_function" may be occupied only by a window function or an aggregate function. See the section Informal overview of function invocation using the OVER clause for runnable examples of this syntax variant.
If any other kind of function, for example "sqrt()", occupies this syntax spot, then it draws this specific compilation error:
42809: OVER specified, but sqrt is not a window function nor an aggregate function
And if any other expression is used at this syntax spot, then it causes this generic compilation error:
42601: syntax error at or near "over"
Second example
This shows the use of two window functions with OVER clauses that each reference the same window_definition that is defined separately in a WINDOW clause.
select
...
window_fn_1(...) over w as a1,
window_fn_2(...) over w as a2,
...
from ...
window w as (
partition by <column list 1> -- PARTITION BY clause
order by <column list 2> -- ORDER BY clause
range between unbounded preceding and unbounded following -- frame_clause
)
...
For a runnable example of this syntax variant, see first_value(),nth_value(), last_value().
Notice that the syntax rules allow both this:
window_fn_1(...) over w as a1
and this:
window_fn_1(...) over (w) as a1
The parentheses around the window's identifier convey no meaning, Yugabyte recommends that you don't use this form because doing so will make anybody who reads your code wonder if it does convey a meaning.
Third example
This shows how a generic window_definition that is defined in a WINDOW clause is specialized in a particular OVER clause that references it.
select
...
(window_fn(...) over w) as a1,
(aggregate_fn_1(...) over (w range between unbounded preceding and unbounded following)) as a2,
(aggregate_fn_1(...) over (w range between unbounded preceding and current row)) as a3,
...
from ...
window w as (partition by <column list 1> order by <column list 2>)
...
Fourth example
This shows how the window_definition specialization technique that the third example showed can be used in successively in the WINDOW clause.
select
...
(window_fn(...) over w1) as a1,
(aggregate_fn_1(...) over w2) as a2,
(aggregate_fn_1(...) over w3) as a3,
...
from ...
window
w1 as (partition by <column list 1> order by <column list 2>),
w2 as (w1 range between unbounded preceding and unbounded following),
w3 as (w1 range between unbounded preceding and current row)
For a runnable example of this fourth syntax variant, see Comparing the effect of percent_rank(), cume_dist(), and ntile() on the same input.