Scripts you might need: UserCreation, TablesCreation.
If you are using LiveSQL: LiveSQL_1
In this lesson, you will learn how to filter duplicate rows from a result set using the DISTINCT keyword.
To show you why it is useful, let’s run this simple query:
SELECT department_id
FROM employee;
Results:
DEPARTMENT_ID |
---|
1 |
2 |
3 |
4 |
1 |
1 |
3 |
1 |
2 |
3 |
4 |
4 |
4 |
As you can see, I got several rows with department id 1, and there are several rows for department 4 also.
That is because, in this table, I have several employees for each one of those departments. But, what should I do if the requirement is to display a list of IDs of the different departments that exist in the employee table?
Including any department more than once wouldn’t make any sense for that requirement, so, in cases like this one is where the DISTINCT keyword comes into play.
I just need to add it before the list of columns I want to return.
SELECT DISTINCT department_id
FROM employee;
Results:
DEPARTMENT_ID |
---|
1 |
2 |
4 |
3 |
What Exactly does DISTINCT do?
So, the DISTINCT keyword tells the database to remove duplicate rows. And I need to emphasize that it makes it remove duplicate ROWs, not columns.
Let’s see another example:
SELECT DISTINCT department_id, name
FROM employee;
Results:
DEPARTMENT_ID | NAME |
---|---|
1 | JOHN SMITH |
4 | LUKE MINT |
3 | IGOR KEYS OSBOURNE |
1 | ANGIE CROOD |
1 | EDDIE FARREL |
2 | GEORGE HAYES |
4 | LUKE GREEN JR |
3 | LUISA JACKSON |
4 | STUART GARCIA |
1 | JUSTIN BLACK |
3 | CHARLES DEAN |
4 | NIGEL OAKS |
2 | JAMES BOSH |
Why do I have several rows with department id 1 if I applied the DISTINCT keyword to the department_id column?
Well, that is not true. I didn’t apply the DISTINCT keyword to the department id column. As I mentioned earlier, it removes duplicate ROWS, not columns, so, even though the DISTINCT keyword is just before that column, it doesn’t mean it is applied to that column only.
The DISTINCT keyword is applied to the whole row, and there aren’t any duplicate rows here, because, there is a row with department id 1 and name JOHN SMITH, and there is another one that has department id 1 too, but name ANGIE CROOD. So, the row is different because there is one column that is different.
If I had a query that returns a hundred columns, and there are some rows that have the exact same values for 99 of those columns, but there is one column with different value, then those rows are not duplicates and will not be filtered by applying the DISTINCT keyword.
Did you know?
From time to time, you will see the word “UNIQUE“ in some people’s queries, and also from time to time you will find some developer that when looking at such code reacts by saying “that’s an error, this code won’t even compile”
The truth is, that even though almost no one uses it, UNIQUE is a correct SQL keyword in Oracle, and it is equivalent to DISTINCT. So, you can actually use UNIQUE in place of DISTINCT, and your results will be the same.
Warning!
The DISTINCT keyword can be very useful, but I have seen it many times being used to mask or hide errors related to joining tables in an incorrect way.
Joining tables is a topic that will be covered in a future lesson, so this warning might not make complete sense to you by now, but I just want you to keep this in mind, so, every time you see the DISTINCT keyword being used, you pay extra attention to the query, to see if it is really necessary and if it is being used correctly, and also, to make you think twice every time you feel the need to include it in your own queries.
Okay, that’s it for this lesson. In the next one, you will start learning about sorting the results of your queries.
Things are getting more and more interesting, don’t you think?
See you in the next lesson.