4

Subqueries in Oracle SQL

Subqueries are a very powerfull feature of SQL, but I have seen them being misused or underused many times, and that is why I decided to write about them today.

The Basics

For starters, a subquery is a SELECT statement that is included or nested within another SQL statement, which can be another SELECT or an INSERT, UPDATE or DELETE. Subqueries are always enclosed within parentheses.

In most cases, subqueries are executed prior to the execution of the main statement where they are nested in, with the exception of correlated subqueries, which are covered some paragraphs below.

The result set produced by a subquery is only kept in memory during the life of the main statement, that is, when the main statement’s execution finishes, the subquery’s result set is discarded.

Here is a very simple example:

SELECT *
FROM table_A
WHERE column_X IN
  (
    SELECT column_X
    FROM table_B
  );

In this example the subquery is what you see in lines 5 and 6, so the SELECT from table_B is run first, then the query from table_A is executed using the result set from table_B as the filter criteria, and when execution finishes, the result set from the subquery is discarded.

Classifications

One of the ways subqueries can be categorized is based on whether they reference columns from the main statement or not. When they do reference columns from the outer statement they are called correlated subqueries, and when they don’t reference anything outside themselves they are called non-correlated subqueries. Let’s review these types with a little more detail.

Correlated Subqueries

As mentioned earlier, they reference one or more columns from the outer or main statement (which can be a SELECT, UPDATE or DELETE). So, since they are related to the main query, they cannot be executed alone prior to executing the main statement. Instead, they are executed once for each candidate row that can possibly be returned or affected by the main statement.

Here is a simple example of a correlated subquery:

SELECT *
FROM customers c
WHERE 100 <
(
SELECT COUNT ( * )
FROM orders
WHERE customer_id = c.customer_id
);

As you can see, "c.customer_id" appears in the subquery's WHERE clause, where "c" is the alias for the customers table in the main query.

But the WHERE clause is not the only place where you can find correlated subqueries. Check this UPDATE example, where the subquery is in the SET clause:

UPDATE customers c
SET orders =
  (
    SELECT COUNT( * )
    FROM orders
    WHERE customer_id = c.customer_id
  )
WHERE customer_type = 1;

Or this query where it is in the SELECT list:

SELECT customer_id, name, (
    SELECT COUNT( * )
    FROM orders
    WHERE customer_id = c.customer_id
  ) orders_count
FROM customers c;

The problem with correlated subqueries, is that since they are evaluated once per each candidate row from the main statement, they can severely affect performance, and I have seen people misusing them quite a few times.

So, to get the same results than the above query, I could use this one, which doesn't need any subquery:

SELECT customer_id, name, COUNT( * ) orders_count
FROM customers c
JOIN orders o
	USING (customer_id)
GROUP BY customer_id, name;

In general, which one do you think would tend to perform better? Share your thoughts in the discussion section at the end of the article 😉

Well, fortunately the query optimizer is getting smarter in every new version of the database, and in many situations it is smart enough to see that a query with a correlated subquery can be transformed into a different and more efficient query, so it ends up running probably a join query or some other type of statement that is more efficient than the one we were telling it to run. However, this happens many times, but not always.

So, correlated subqueries can be very useful, but always make sure that using them is the best way to approach a problem.

Non-correlated Subqueries

As you might have guessed, these are the ones that are independent of the main statement, such as the first example included in the article.

Non-correlated subqueries are executed once before the execution of the main statement, and can be included in all places where a correlated subquery can be included, plus some other places. For example, here is an INSERT statement, which uses a non-correlated subquery:

INSERT
INTO departments
  (
    department_id, department_name, division_id
  )
  VALUES
  (
    100, 'DEVELOPMENT', (
      SELECT division_id
      FROM divisions
      WHERE division_name = 'INFORMATION TECHNOLOGY'
    )
  );

And they can even appear in the HAVING clause of a SELECT statement.

Scalar Subqueries (Single Row, Single Column)

Subqueries that return only one row and only one column are sometimes called "scalar" subqueries, because they return a single value.

The INSERT from the above example includes a scalar subquery, which was used to get the id for the IT division. Scalar subqueries are the only type of subquery that can be included in the VALUES clause of an INSERT statement.

Other types, based on the number of rows and columns returned

There are other combinations of number of columns/rows returned by the subquery, which have some importance, because those numbers affect the places where a determined subquery can be used.

For example, when you use the IN operator in the WHERE clause of a query, you can use one that returns several rows, regardless of the number of columns it returns. For example, if you are filtering by one column, you can do something like this:

SELECT *
FROM customers
WHERE customer_id IN
  (
    SELECT customer_id
    FROM orders
    WHERE amount > 1000
  );

But you can also do something like this, to use the IN operator to compare more than one column:

SELECT *
FROM customers
WHERE (customer_id,last_order) IN
  (
    SELECT customer_id,placement_date
    FROM orders
    WHERE amount > 1000
  );

And you could also do something similar in the SET clause of an UPDATE statement, but in this case the subquery must return only one row (can be correlated or non-correlated):

UPDATE customers c
SET (sales, last_order) =
  (
    SELECT COUNT( * ), MAX(placement_date)
    FROM orders
    WHERE customer_id = c.customer_id
  )
WHERE customer_id BETWEEN 1 AND 10000;

On a similar note, when you use the ANY or ALL special keywords with comparison operators, you can have a subquery that returns several rows, but only one column.

So, in general, the number of rows and columns returned by a subuquery determines whether it can be used in different situations or places of a main statement, and you shouldn't need to try to memorize those restrictions, because, in general, they just make sense.

Inline Views

An inline view is simply a subquery that appears in the FROM clause of a SELECT statement. They can be used in other parts of the main query as you would with a table or view (reference its columns, join it to other data sources, etc) . Inline views are always non-correlated subqueries.

The concept is very simple, but its use is very powerful.

Here is an example of an inline view:

SELECT customer_id, c.name, COUNT( * ) orders_count
FROM customers c
JOIN
  (
    SELECT *
    FROM orders
    WHERE amount < 100 
   ) o 
   USING (customer_id) 
WHERE o.placement_date > '01-JAN-2000'
GROUP BY customer_id, c.name;

The example shows an inline view from a single table, but you can have a complete SELECT statement there, with joins, functions, grouping, etc.

The only additional consideration is that you have to provide an alias for the inline view, which is also pretty obvious, because otherwise you wouldn't have a way to reference it from other parts of the query.

Subquery Factoring (The WITH Clause)

The WITH clause, or subquery factoring clause as it's also called, was introduced in version 9 of the database, and what it does is that it allows you to declare subqueries (can be more than one) that are treated in a similar way than inline views, but can be used in more than one place in the same main query (without having to write it more than once, and without needing to be executed more than once).

So, the example used in the inline views section can be rewritten using the WITH clause, this way:

WITH small_orders AS
  (
    SELECT *
    FROM orders
    WHERE amount < 100 ) 
SELECT customer_id, c.name, COUNT( * ) orders_count 
FROM customers c 
JOIN small_orders o 
	USING (customer_id) 
WHERE o.placement_date > '01-JAN-2000'
GROUP BY customer_id, c.name;

Here is an example that shows how you can declare more than one subquery using the WITH clause. The example is equivalent to the above query, and doesn't make sense to be written like this. My only goal here is to show you how to do it.

WITH small_orders AS
  (
    SELECT *
    FROM orders
    WHERE amount > 100
  ),
  c AS
  (
    SELECT *
    FROM customers
  )
SELECT customer_id, c.name, COUNT( * ) orders_count
FROM c
JOIN small_orders o
	USING (customer_id)
WHERE o.placement_date > '01-JAN-2000'
GROUP BY customer_id, c.name;

For an example of how it can help you build more efficient queries, take a look at this article.

Conclusion

So, as you can see, subqueries can be included in all DML statements, and they can appear in many different places depending on their type and the kind of results they return. They can be very powerful and are sometimes underutilized, but can also be misused making SQL statements inefficient.

Here I only covered the basics, so that you have an idea of what is possible, but there can be restrictions, features, and other interesting uses for subqueries that were not covered here, and you might want to research and/or test on your own.

When writing code for applications or scripts that are going to be run more than once, it is always a good idea to test and profile different approaches, to see what the better option is, taking performance and code readability in consideration.

Happy coding!

Have something to say, suggest, or ask? Share your wisdom below!

Opt In Image
Don't want to miss any Oracle SQL tips?

Subscribe to be informed about new posts, tips and more awesome things.

Carlos

Carlos

I’ve been working with Oracle databases on a daily basis for more than 10 years.

Leave a Reply

4 Comments on "Subqueries in Oracle SQL"

avatar
Sort by:   newest | oldest | most voted
trackback

[…] You can read the basics about the WITH clause in my previous article about subqueries. […]

trackback

[…] be able to use the rownum to get the correct result, I need to use a subquery like […]

trackback

[…] as you can see I used a subquery (starts in line # 3). Do you remember how this type of subqueries that appear in the FROM clause are […]

trackback

[…] they thought about subquery factoring to filter the table just once instead of doing it in the main query and the subquery, and they came […]

wpDiscuz