Ranking functions are some of the most commonly used analytic functions, and although the “ranking” category can quickly make you think about Top-N queries, that is actually not the only type of problems you can solve with them.
In this short post, I demonstrate another common use of the ranking functions in SQL.
For an introduction to Analytic Functions in general, read my previous post: Understanding Analytic Functions.
I used the word “prioritizing” for lack of a better one, but I’m referring to cases where you need to give preference to some rows even when there are some other rows that also comply with the query’s mandatory conditions (if those exist).
Here is an example of what I mean:
I created a very simple table to store books and I inserted some random titles and some fake editions of them to build the example:
|1||A Storm Of Swords||George R. R. Martin||HC||40.00||01-mar-2016|
|2||In Cold Blood||Truman Capote||HC||45.00||15-feb-2016|
|3||In Cold Blood||Truman Capote||PB||35.00||01-apr-2016|
|4||In Cold Blood||Truman Capote||MMPB||25.00||01-jun-2016|
|5||The Grass Harp||Truman Capote||PB||32.00||01-jan-2016|
|6||The Once and Future King||T. H. White||PB||35.00||01-jul-2016|
|7||First and Last||Truman Capote||HC||45.00||01-feb-2016|
|8||First and Last||Truman Capote||PB||30.00||01-may-2015|
And here is my editions look-up table (not actually used in the example):
And now my task is to list all of the books written by Truman Capote, but each book must be listed only once, with the additional restriction that if the book has a mass-market paperback edition, that is the one that must be listed, otherwise, the edition with the oldest date_added should be listed.
How would you solve it?
Every time I see a problem where some rows are “better” than others, I immediately think of the ranking functions.
But a simple/direct top-1 query wouldn’t suffice in this case because the “preferred” edition is MMPB, which if ordered alphabetically, comes before PB but after HC, so a simple “ORDER BY edition” doesn’t work (either ascending or descending), and also because if there is no MMPB edition, then the ordering criterion is no longer related to the edition, but to another column (the date in this case).
So, what I can do in this case, is to take advantage of the flexibility that the ORDER BY clause offers, to create an “artificial” sorting criteria for a ranking function to assign a preference or priority to each row:
SELECT id, title, author, edition_id, date_added,row_number() OVER (PARTITION BY title, author ORDER BY CASE WHEN edition_id = 'MMPB' THEN 'A' ELSE 'B' END, date_added) AS priority FROM books b WHERE author = 'Truman Capote';
These are the results of this query:
|8||First and Last||Truman Capote||PB||01-MAY-2015 00:00:00||1|
|7||First and Last||Truman Capote||HC||01-FEB-2016 00:00:00||2|
|4||In Cold Blood||Truman Capote||MMPB||01-JUN-2016 00:00:00||1|
|2||In Cold Blood||Truman Capote||HC||15-FEB-2016 00:00:00||2|
|3||In Cold Blood||Truman Capote||PB||01-APR-2016 00:00:00||3|
|5||The Grass Harp||Truman Capote||PB||01-JAN-2016 00:00:00||1|
I am using a CASE expression as the first criterion in the analytic ORDER BY clause (starts at line 3).
I’m telling the database to order the results by a value that is ‘A’ for books where the edition is MMPB and ‘B’ for any other edition.
This forces the MMPB edition to always be the first priority if it exists.
If an MMPB edition doesn’t exist, then all other editions of the same book have the same value for the first sorting criterion (‘B’) and the ties are then broken by ordering by date_added (line 7) as it was required.
Now that the rows have been prioritized, I just need to filter the results in order to return only the desired row for each book:
WITH prioritized AS ( SELECT b.*, row_number() OVER (PARTITION BY title, author ORDER BY CASE WHEN edition_id = 'MMPB' THEN 'A' ELSE 'B' END, date_added) AS priority FROM books b WHERE author = 'Truman Capote' ) SELECT id, title, author, edition_id, date_added FROM prioritized WHERE priority = 1;
I needed to use a subquery because analytic functions are calculated after the WHERE clause is evaluated, so they cannot be used or referenced in the WHERE clause.
Here are the final results (in no specific order):
|8||First and Last||Truman Capote||PB||01-may-2015|
|4||In Cold Blood||Truman Capote||MMPB||01-jun-2016|
|5||The Grass Harp||Truman Capote||PB||01-jan-2016|
I used the ROW_NUMBER function in this case, but depending on the situation, some of the other ranking functions might need to be used.
I will cover some other pretty cool uses of the ranking functions in another post.
Have something to add?
Sound off in the comments section below!