I decided to write about Top-N queries today because even though there are lots of articles and documentation about the correct way to do it in Oracle out there, I have seen this question asked on different forums, and to my surprise, many times people suggesting incorrect ways to do it without anyone clarifying their mistake. Now, every time I see the question again, I will have a trustworthy link to share 😉
What are Top-N queries?
Top-N queries are a type of query that is used to get the Top-N records based on some sorting criteria, where N can be any natural number, for example, to get the top 5 earning employees, or the top 10 selling products.
There are several ways to write Top-N queries in Oracle. I’m going to show you the most common ones.
Using the rownum pseudocolumn
One of the options we have is to utilize the rownum pseudocolumn. This is a number that is assigned to a row when it is retrieved from the source table, and something that is key to know, is that if it is used in a query that has an ORDER BY clause, this number is assigned before the rows are ordered, and that is why it cannot be used to create top-n queries directly, which is a mistake some people do.
Take a look at this. If I run this query on my test table:
SELECT employee_id, first_name, last_name, salary FROM employees ORDER BY salary DESC;
These are the first rows in the result set:
Now, look what happens if I try to get the top-5 earning employees by using the rownum directly:
SELECT employee_id, first_name, last_name, salary FROM employees WHERE rownum <= 5 ORDER BY salary DESC;
These are not the real top-5 earners.
To be able to use the rownum to get the correct result, I need to use a subquery like this:
WITH ordered AS ( SELECT employee_id,first_name,last_name,salary FROM employees ORDER BY salary DESC ) SELECT * FROM ordered WHERE rownum <= 5;
Which returns these [correct] results:
How does it work? Well, the subquery orders the employees by salary, and once they are ordered they are retrieved by the main query in that order, and at that point the rownum is assigned. So, essentially, what I did was to invert the order in which things happen. Without the subquery, the rownum is assigned first, but with the subquery, the ordering is applied first. And as you might be thinking, yes, this could have been an inline view also, and the results would be the same.
Using analytic functions
Another way to create top-n queries is by using analytic functions, which offer some extra flexibility as to how we want to handle certain situations.
There are 3 functions you could use to create top-n queries. Here is an example that shows how these functions work, so that you can decide which one gives you the desired results.
SELECT employee_id,first_name,last_name,salary, ROW_NUMBER() over (order by salary DESC) row_number, RANK() over (order by salary DESC) rank, DENSE_RANK() over (order by salary DESC) dense_rank FROM employees ORDER BY salary DESC;
In this case the ROW_NUMBER function works in a similar manner than the rownum pseudocolumn. As you can see, it assigned the row number correctly based on the specified order, but it did because I added an ORDER BY in the function’s analytic clause.
But If you look at the results from the RANK function, it assigned a rank of 2 to the second and third rows, why? Because those employees have the same salary. It takes into account the fact that they earn the same salary, so it assigns them the same rank. This might or might not be what you want, but that is the way this function works. Another thing to notice is that since it assigned 2 rows the same rank (number 2), there is no rank number 3. So, when using the RANK function, keep in mind that ranks may not be consecutive numbers.
The DENSE_RANK function works similar to RANK, but it does not skip rank numbers. As you can see it assigned the same rank to the second and third rows too, but the fourth row was assigned the next rank number, without leaving any holes.
Now, the previous example is not a top-n query. It just returned our results in order and assigned a number we could use to filter the results.
We still need to filter the results so that we only get the top 5, but since these analytic functions are calculated after the WHERE clause is evaluated, you cannot use them or reference them in the WHERE clause, which means we need a subquery in this case too.
Here is how the top-5 earners query would look like using the ROW_NUMBER function:
WITH numbered AS ( SELECT employee_id,first_name,last_name,salary, ROW_NUMBER() over (order by salary DESC) row_number FROM employees ) SELECT * FROM numbered WHERE row_number <= 5;
An additional benefit of using analytic functions for top-n queries, is that they allow you to easily get top-n results for different groups of rows, for example, the top-5 earners per department.
We just need to add a PARTITION clause to the function’s analytic clause:
WITH numbered AS (SELECT employee_id,first_name,last_name,salary,department_id, ROW_NUMBER() over (partition BY department_id order by salary DESC) row_number FROM employees ) SELECT * FROM numbered WHERE row_number <= 5;
As you can see, the row number is reset for each department. In this case you don’t see numbers from 1 to 5 for all departments because some departments only have 1 or 2 employees.
Using the row limiting clause
The row limiting clause is new in Oracle 12c, and it greatly simplifies writing top-n queries.
See how easy it is to create our top-5 earners query with this clause:
SELECT employee_id,first_name,last_name,salary FROM employees ORDER BY salary DESC FETCH FIRST 5 ROWS ONLY;
No subquery, no analytic function… Only one additional line of code (the last one).
Because this is new in 12c, you cannot test it if you use an express edition (there is currently no express edition 12c). But if you want to give it a try, Oracle Live SQL is waiting for you 😉
So, there you have it. Now you know how to create Top-N queries in Oracle.
Have something to add? Great, share your thoughts in the discussion section below.