8

Top-N queries in Oracle SQL

I decided to write about Top-N queries today because even though there are lots of articles and documentation about the correct way to do it in Oracle out there, I have seen this question asked on different forums, and to my surprise, many times people suggesting incorrect ways to do it without anyone clarifying their mistake. Now, every time I see the question again, I will have a trustworthy link to share 😉

What are Top-N queries?

Top-N queries are a type of query that is used to get the Top-N records based on some sorting criteria, where N can be any natural number, for example, to get the top 5 earning employees, or the top 10 selling products.

There are several ways to write Top-N queries in Oracle. I’m going to show you the most common ones.

Using the rownum pseudocolumn

One of the options we have is to utilize the rownum pseudocolumn. This is a number that is assigned to a row when it is retrieved from the source table, and something that is key to know, is that if it is used in a query that has an ORDER BY clause, this number is assigned before the rows are ordered, and that is why it cannot be used to create top-n queries directly, which is a mistake some people do.

Take a look at this. If I run this query on my test table:

SELECT employee_id, first_name, last_name, salary
FROM employees
ORDER BY salary DESC;

These are the first rows in the result set:

EMPLOYEE_ID FIRST_NAME LAST_NAME SALARY
100 Steven King 24000
101 Neena Kochhar 17000
102 Lex De Haan 17000
145 John Russell 14000
146 Karen Partners 13500
201 Michael Hartstein 13000

Now, look what happens if I try to get the top-5 earning employees by using the rownum directly:

SELECT employee_id, first_name, last_name, salary
FROM employees
WHERE rownum <= 5
ORDER BY salary DESC;
EMPLOYEE_ID FIRST_NAME LAST_NAME SALARY
100 Steven King 24000
101 Neena Kochhar 17000
102 Lex De Haan 17000
103 Alexander Hunold 9000
104 Bruce Ernst 6000

These are not the real top-5 earners.

To be able to use the rownum to get the correct result, I need to use a subquery like this:

WITH ordered AS
(
SELECT employee_id,first_name,last_name,salary
FROM employees
ORDER BY salary DESC
)
SELECT *
FROM ordered
WHERE rownum <= 5;

Which returns these [correct] results:

EMPLOYEE_ID FIRST_NAME LAST_NAME SALARY
100 Steven King 24000
101 Neena Kochhar 17000
102 Lex De Haan 17000
145 John Russell 14000
146 Karen Partners 13500

How does it work? Well, the subquery orders the employees by salary, and once they are ordered they are retrieved by the main query in that order, and at that point the rownum is assigned. So, essentially, what I did was to invert the order in which things happen. Without the subquery, the rownum is assigned first, but with the subquery, the ordering is applied first. And as you might be thinking, yes, this could have been an inline view also, and the results would be the same.

Using analytic functions

Another way to create top-n queries is by using analytic functions, which offer some extra flexibility as to how we want to handle certain situations.

There are 3 functions you could use to create top-n queries. Here is an example that shows how these functions work, so that you can decide which one gives you the desired results.

SELECT employee_id,first_name,last_name,salary,
ROW_NUMBER() over (order by salary DESC) row_number,
RANK() over (order by salary DESC) rank,
DENSE_RANK() over (order by salary DESC) dense_rank
FROM employees
ORDER BY salary DESC;
EMPLOYEE_ID FIRST_NAME LAST_NAME SALARY ROW_NUMBER RANK DENSE_RANK
100 Steven King 24000 1 1 1
101 Neena Kochhar 17000 2 2 2
102 Lex De Haan 17000 3 2 2
145 John Russell 14000 4 4 3
146 Karen Partners 13500 5 5 4
201 Michael Hartstein 13000 6 6 5
108 Nancy Greenberg 12008 7 7 6

 

In this case the ROW_NUMBER function works in a similar manner than the rownum pseudocolumn. As you can see, it assigned the row number correctly based on the specified order, but it did because I added an ORDER BY in the function's analytic clause.

But If you look at the results from the RANK function, it assigned a rank of 2 to the second and third rows, why? Because those employees have the same salary. It takes into account the fact that they earn the same salary, so it assigns them the same rank. This might or might not be what you want, but that is the way this function works. Another thing to notice is that since it assigned 2 rows the same rank (number 2), there is no rank number 3. So, when using the RANK function, keep in mind that ranks may not be consecutive numbers.

The DENSE_RANK function works similar to RANK, but it does not skip rank numbers. As you can see it assigned the same rank to the second and third rows too, but the fourth row was assigned the next rank number, without leaving any holes.

Now, the previous example is not a top-n query. It just returned our results in order and assigned a number we could use to filter the results.

We still need to filter the results so that we only get the top 5, but since these analytic functions are calculated after the WHERE clause is evaluated, you cannot use them or reference them in the WHERE clause, which means we need a subquery in this case too.

Here is how the top-5 earners query would look like using the ROW_NUMBER function:

WITH numbered AS
(
SELECT employee_id,first_name,last_name,salary,
ROW_NUMBER() over (order by salary DESC) row_number
FROM employees
)
SELECT *
FROM numbered
WHERE row_number <= 5;

An additional benefit of using analytic functions for top-n queries, is that they allow you to easily get top-n results for different groups of rows, for example, the top-5 earners per department.

We just need to add a PARTITION clause to the function's analytic clause:

WITH numbered AS
(SELECT employee_id,first_name,last_name,salary,department_id,
ROW_NUMBER() over (partition BY department_id order by salary DESC) row_number
FROM employees
)
SELECT *
FROM numbered
WHERE row_number <= 5;
EMPLOYEE_ID FIRST_NAME LAST_NAME SALARY DEPARTMENT_ID ROW_NUMBER
200 Jennifer Whalen 4400 10 1
201 Michael Hartstein 13000 20 1
202 Pat Fay 6000 20 2
114 Den Raphaely 11000 30 1
115 Alexander Khoo 3100 30 2
116 Shelli Baida 2900 30 3
117 Sigal Tobias 2800 30 4
118 Guy Himuro 2600 30 5
203 Susan Mavris 6500 40 1

As you can see, the row number is reset for each department. In this case you don't see numbers from 1 to 5 for all departments because some departments only have 1 or 2 employees.

Using the row limiting clause

The row limiting clause is new in Oracle 12c, and it greatly simplifies writing top-n queries.

See how easy it is to create our top-5 earners query with this clause:

SELECT employee_id,first_name,last_name,salary
FROM employees
ORDER BY salary DESC
FETCH FIRST 5 ROWS ONLY;

No subquery, no analytic function... Only one additional line of code (the last one).

Because this is new in 12c, you cannot test it if you use an express edition (there is currently no express edition 12c). But if you want to give it a try, Oracle Live SQL is waiting for you 😉

So, there you have it. Now you know how to create Top-N queries in Oracle.

Have something to add? Great, share your thoughts in the discussion section below.

Opt In Image
Don't want to miss any Oracle SQL tips?

Subscribe to be informed about new posts, tips and more awesome things.

Share this!
Carlos

Carlos

I've been working with Oracle databases on a daily basis for more than 10 years.

Leave a Reply

8 Comments on "Top-N queries in Oracle SQL"

avatar
Sort by:   newest | oldest | most voted
Raviteja
Guest

Its good very usefull information

thatjeffsmith
Guest

If you want to play with the new 12c FETCH syntax, you can also use our VirtualBox appliance – it’s free for educational purposes. http://www.oracle.com/technetwork/database/enterprise-edition/databaseappdev-vm-161299.html

trackback

[…] Did you read my article about writing TOP-N queries? […]

trackback

[…] functions, and although the “ranking” category can quickly make you think about Top-N queries, that is actually not the only type of problems you can solve with […]

Liron
Guest

Hi Carlos, nice post.
I have one comment regarding the top-N 12c syntax. In some cases the manual (pre-12c) syntax is more efficient because of optimization issues. You can read it in my post: https://amitzil.wordpress.com/2015/10/09/query-paging-in-12c/

trackback

[…] Did you read my article about writing TOP-N queries? […]

wpDiscuz