Home | Portfolio | GitHub | LinkedIn | Medium | Stack Overflow | Terms | E-mail
Window Functions and Case Expressions in SQL: Aggregating Values
When working with a table in SQL, it is often the case that one might wish to aggregate values, or calculate a running total among the values in the table.
In this article, we will investigate how this can be done using what is called a window function.
Additionally, we will see how a CASE statement can also be nested within the window function to further customise the calculations that can be performed on an array based on certain conditions.
SUM and AVERAGE Using Window Function
Consider the following hypothetical table of clothing items listed in a store on a certain date (values made up by the author).
date | item | price | size
----------------+--------------------+---------+--------
2021-01-01 | Sky Blue Jeans | 79.99 | 31
2021-01-02 | Fire Red Jeans | 89.99 | 36
2021-01-03 | Sky Blue Shirt | 59.99 | 38
2021-01-04 | Grass Green Shirt | 69.99 | 34
2021-01-05 | Peach Purple Hat | 79.99 | 40
2021-01-06 | Sun Yellow Jeans | 109.99 | 42
2021-01-07 | Olive Green Hat | 89.99 | 37
Now, consider that the owner would like to sum up and average each value on a cumulative basis, i.e. create a new array that displays the sum of the first two values, then the first three, and so on. The same applies to calculating averages.
A window function can be used to sum the values as follows (first five rows are illustrated):
>>> SELECT date,price,
>>> SUM(price) OVER (ORDER BY date)
>>> AS total_price
>>> FROM table;
date | price | total_price
---------------------+---------+------------
2021-01-01 | 79.99 | 79.99
2021-01-02 | 89.99 | 169.98
2021-01-03 | 59.99 | 229.97
2021-01-04 | 69.99 | 299.96
2020-01-05 | 79.99 | 379.95
(5 rows)
In the same way, the average cumulative price can also be calculated.
>>> SELECT date,price,
>>> AVG(price) OVER (ORDER BY date)
>>> AS mean_price
>>> FROM table;
date | price | mean_price
---------------------+---------+------------
2021-01-01 | 79.99 | 79.99
2021-01-02 | 89.99 | 84.99
2021-01-03 | 59.99 | 76.66
2021-01-04 | 69.99 | 74.99
2020-01-05 | 79.99 | 75.99
(5 rows)
Combining CASE Statement with Window Function
A CASE statement functions in a similar way to an if-then statement. A particular value is returned if a condition is met — otherwise another value is returned if the condition is not met.
Let’s consider this example. Suppose that for this particular clothes store, the merchant had to offer refunds on certain items. How could this be reflected in the running total?
Consider this expanded table.
date | item | price | size | refund
----------------+--------------------+---------+--------+---------
2021-01-01 | Sky Blue Jeans | 79.99 | 31 | no
2021-01-02 | Fire Red Jeans | 89.99 | 36 | no
2021-01-03 | Sky Blue Shirt | 59.99 | 38 | no
2021-01-04 | Grass Green Shirt | 69.99 | 34 | yes
2021-01-05 | Peach Purple Hat | 79.99 | 40 | yes
2021-01-06 | Sun Yellow Jeans | 109.99 | 42 | no
2021-01-07 | Olive Green Hat | 89.99 | 37 | no
As we can see from the above, the merchant offers a refund in this case on the 4th and 5th January. To calculate the new cumulative sum, these values need to be subtracted — rather than added to the total.
In this regard, a CASE statement is nested within the window function — with an instruction to make the price value negative if the refund variable contains a yes value.
>>> SELECT date,price,refund,SUM(CASE WHEN refund = 'yes' THEN -1*price ELSE price END) OVER (ORDER BY date) AS total_price FROM table;
date | price | total_price | refund
---------------------+---------+--------------+------------
2021-01-01 | 79.99 | 79.99 | no
2021-01-02 | 89.99 | 169.98 | no
2021-01-03 | 59.99 | 229.97 | no
2021-01-04 | 69.99 | 159.98 | yes
2020-01-05 | 79.99 | 79.99 | yes
2020-01-06 | 79.99 | 189.98 | no
2020-01-07 | 79.99 | 279.97 | no
(5 rows)
As we can see, the total price of 279.97 is calculated once the prices of 69.99 and 79.99 have been subtracted on the 4th and 5th January — which is being performed by the CASE statement since a yes entry for the refund variable results in a negative price being assigned by the CASE statement.
Conclusion
In this article, you have seen:
- The purpose of using a window function
- How window functions can be used to obtain cumulative values
- Combining of a window function with a CASE statement to make window functions more flexible
Many thanks for reading, and you can find further examples of useful SQL practices here.