I recently received a question about Pairwise Subqueries from one of the students in my SQL course, and I thought it would be a good idea to make my answer a blog post, since I did a couple of Google searches and didn’t find a lot of articles explaining the concept, so, this post is a slightly modified version of my answer.
To demonstrate the idea, I used a small employee table with the following data:
Being pairwise and non-pairwise is in reality not an attribute or characteristic of a subquery, but about a comparison.
In short, a pairwise comparison is when you want to compare a pair of values from the row that is being evaluated in the main query, to a list of pairs of values provided by the subquery.
Here is an example:
SELECT * FROM employee WHERE ( department_id, bonus ) IN ( SELECT department_id, bonus FROM employee WHERE id = 5 );
Since the condition in the WHERE clause includes 2 columns, then the subquery must return 2 columns as well.
In a comparison like this, the database reviews the rows returned by the subquery to see if there is one in which the department_id and bonus are exactly equal to the department_id and bonus of the row that is being reviewed in the main query. In other words, it looks for a match of the exact combination of department_id and bonus.
If it finds a row where the department_id is the same but the bonus is different, the condition is not met, and if it finds a row where the bonus is the same, but the department_id is different, the condition is not met. For the condition to be true, the subquery must return a row in which the department_id and bonus are the same as the ones of the row that is being evaluated in the main query. Both must be the same at the same time or in the same row.
In this case, this subquery returns all of the rows that have the exact same combination of department_id and bonus as the employee with id=5.
This would be an example of a non-pairwise comparison with subqueries:
SELECT * FROM employee WHERE department_id IN ( SELECT department_id FROM employee WHERE id = 5 ) AND bonus IN ( SELECT bonus FROM employee WHERE id = 5 );
In this particular example both queries produce the same result because the subquery is returning only one row, but take a look at this:
SELECT * FROM employee WHERE ( department_id, bonus ) IN ( SELECT department_id, bonus FROM employee WHERE id IN (2,5) );
This query produces this result:
But this one, in which we do a non-pairwise comparison:
SELECT * FROM employee WHERE department_id IN ( SELECT department_id FROM employee WHERE id IN (2,5) ) AND bonus IN ( SELECT bonus FROM employee WHERE id IN (2,5) );
Produces this result:
The reason for the difference is that in the pairwise comparison, the database looks for rows in which the 2 columns match in the same row, but in the non-pairwise comparison, the database evaluates the conditions about the department_id and bonus independently.
In this case, the subquery about the department_id returns departments 1 and 2, and the subquery about the bonus returns bonuses 100 and 200, so, the main query returns all employees whose department id is 1 or 2 and whose bonus is 100 or 200.
For example, employee George Hayes is returned because his department id is 2 and his bonus is 100, and employee James Bosh is returned because his department id is 2 and bonus is 200.
For those employees, both conditions, when evaluated independently, are true.
But in the query with pairwise comparison, the subquery returns a row with employee_id 1 and bonus 100 and a row with department_id 2 and bonus 200, so, the main query returns only employees in which the pair of columns (department_id, bonus) is exactly equal to one of the pairs returned by the subquery. That is the reason why employee George Hayes was not returned by that statement, because his department_id is 2 but his bonus is 100, so, his pair of columns is not equal to any of the pairs returned by the subquery.
And you can actually have a comparison of more than 2 columns at the same time.
Something like this:
... WHERE (column1, column2, column3, column4) IN ( SELECT column1, column2, column3, column4 FROM ... )
The only condition is that the subquery must return the same number of columns than the list inside the parentheses in the main query.
It’s awesome that Oracle allows to make this kind of comparison, because, as far as I know, there are some database management systems that don’t support a pairwise comparison with a subquery.
Have anything to add? Sound off in the comments section below!
Subscribe to be informed about new posts, tips and more awesome things.