2

Prioritizing Rows With The Ranking Functions

Ranking functions are some of the most commonly used analytic functions, and although the “ranking” category can quickly make you think about Top-N queries, that is actually not the only type of problems you can solve with them.

In this short post, I demonstrate another common use of the ranking functions in SQL.

For an introduction to Analytic Functions in general, read my previous post: Understanding Analytic Functions.

Prioritizing Rows

I used the word “prioritizing” for lack of a better one, but I’m referring to cases where you need to give preference to some rows even when there are some other rows that also comply with the query’s mandatory conditions (if those exist).

Here is an example of what I mean:

I created a very simple table to store books and I inserted some random titles and some fake editions of them to build the example:

ID TITLE AUTHOR EDITION_ID PRICE DATE_ADDED
1 A Storm Of Swords George R. R. Martin HC 40.00 01-mar-2016
2 In Cold Blood Truman Capote HC 45.00 15-feb-2016
3 In Cold Blood Truman Capote PB 35.00 01-apr-2016
4 In Cold Blood Truman Capote MMPB 25.00 01-jun-2016
5 The Grass Harp Truman Capote PB 32.00 01-jan-2016
6 The Once and Future King T. H. White PB 35.00 01-jul-2016
7 First and Last Truman Capote HC 45.00 01-feb-2016
8 First and Last Truman Capote PB 30.00 01-may-2015

And here is my editions look-up table (not actually used in the example):

EDITION_ID EDITION_NAME
PB PAPERBACK
HC HARDCOVER
MMPB MASS-MARKET PAPERBACK

And now my task is to list all of the books written by Truman Capote, but each book must be listed only once, with the additional restriction that if the book has a mass-market paperback edition, that is the one that must be listed, otherwise, the edition with the oldest date_added should be listed.

How would you solve it?

Every time I see a problem where some rows are “better” than others, I immediately think of the ranking functions.

But a simple/direct top-1 query wouldn’t suffice in this case because the “preferred” edition is MMPB, which if ordered alphabetically, comes before PB but after HC, so a simple “ORDER BY edition” doesn’t work (either ascending or descending), and also because if there is no MMPB edition, then the ordering criterion is no longer related to the edition, but to another column (the date in this case).

So, what I can do in this case, is to take advantage of the flexibility that the ORDER BY clause offers, to create an “artificial” sorting criteria for a ranking function to assign a preference or priority to each row:

SELECT id, title, author, edition_id, date_added,row_number() OVER (PARTITION BY title, author
   ORDER BY
   CASE
      WHEN edition_id = 'MMPB'
      THEN 'A'
      ELSE 'B'
   END, date_added) AS priority
FROM books b
WHERE author = 'Truman Capote';

These are the results of this query:

ID TITLE AUTHOR EDITION_ID DATE_ADDED PRIORITY
8 First and Last Truman Capote PB 01-MAY-2015 00:00:00 1
7 First and Last Truman Capote HC 01-FEB-2016 00:00:00 2
4 In Cold Blood Truman Capote MMPB 01-JUN-2016 00:00:00 1
2 In Cold Blood Truman Capote HC 15-FEB-2016 00:00:00 2
3 In Cold Blood Truman Capote PB 01-APR-2016 00:00:00 3
5 The Grass Harp Truman Capote PB 01-JAN-2016 00:00:00 1

I am using a CASE expression as the first criterion in the analytic ORDER BY clause (starts at line 3).

I’m telling the database to order the results by a value that is ‘A’ for books where the edition is MMPB and ‘B’ for any other edition.

This forces the MMPB edition to always be the first priority if it exists.

If an MMPB edition doesn’t exist, then all other editions of the same book have the same value for the first sorting criterion (‘B’) and the ties are then broken by ordering by date_added (line 7) as it was required.

Now that the rows have been prioritized, I just need to filter the results in order to return only the desired row for each book:

WITH prioritized AS
   (
      SELECT b.*, row_number() OVER (PARTITION BY title, author
      ORDER BY
         CASE
            WHEN edition_id = 'MMPB'
            THEN 'A'
            ELSE 'B'
         END, date_added) AS priority
      FROM books b
      WHERE author = 'Truman Capote'
   )
SELECT id, title, author, edition_id, date_added
FROM prioritized
WHERE priority = 1;

I used my previous statement as a factored subquery, which I called “prioritized”, but I could have used an in-line view as well.

I needed to use a subquery because analytic functions are calculated after the WHERE clause is evaluated, so they cannot be used or referenced in the WHERE clause.

Here are the final results (in no specific order):

ID TITLE AUTHOR EDITION_ID DATE_ADDED
8 First and Last Truman Capote PB 01-may-2015
4 In Cold Blood Truman Capote MMPB 01-jun-2016
5 The Grass Harp Truman Capote PB 01-jan-2016

I used the ROW_NUMBER function in this case, but depending on the situation, some of the other ranking functions might need to be used.

I will cover some other pretty cool uses of the ranking functions in another post.

Have something to add?

Sound off in the comments section below!

Opt In Image
Don't want to miss any Oracle SQL tips?

Subscribe to be informed about new posts, tips and more awesome things.

Carlos

Carlos

I've been working with Oracle databases on a daily basis for more than 10 years.

Leave a Reply

2 Comments on "Prioritizing Rows With The Ranking Functions"

avatar
Sort by:   newest | oldest | most voted
Mario
Guest

Awesome explanation! The first thing I thought when I read the problem was “I need to write a program to do that! ” Thanks for making it easier for us.

wpDiscuz