5

The WITH clause to the rescue!

This time I’m just writing to tell you a story about the WITH clause, because it is really powerful and vastly underutilized.

You can read the basics about the WITH clause in my previous article about subqueries.

So, this is the story about how it helped a couple of guys get some desired results in an efficient way.

One of the guys was working on a database-related task a few days ago, and asked the other guy for his opinion about how to get some results he needed for his task.

He had a query that returned a list of accounts that had more than 4 rows in a transactions table. From this description, the query was something like this:

  1. SELECT Account
  2. FROM transactions
  3. GROUP BY Account
  4. HAVING COUNT(Account) > 4;

But now he needed to list those transactions that were counted in the previous query, so he was thinking about using the above select as a subquery, but wasn’t sure how to do it.

So, the first thing that was suggested was something like this:

  1. SELECT *
  2. FROM transactions t
  3. WHERE EXISTS
  4.   (
  5.     SELECT Account
  6.     FROM transactions
  7.     GROUP BY Account
  8.     HAVING COUNT(Account) > 4
  9.       AND account = t.account
  10.   );

NOTE: This query didn’t really need to be correlated. Something like this would have been more efficient:
  1. SELECT *
  2. FROM transactions
  3. WHERE account IN 
  4.   (
  5.     SELECT Account
  6.     FROM transactions
  7.     GROUP BY Account
  8.     HAVING COUNT(Account) > 4
  9.   )

But it turned out that, in reality, the grouped query included some conditions, and was really something like this:

  1. SELECT Account
  2. FROM transactions
  3. WHERE code <> 'something'
  4.   AND DATE > 'some date'
  5.   AND account LIKE 'something%'
  6. GROUP BY Account
  7. HAVING COUNT(Account) > 4;

And thus, the suggested query was returning more rows than expected. So, the second guy’s first reaction was: “Ah, Ok, just add the same conditions to the main query and it should work“, but it was starting to get a little ugly.

He tested it, but it turned out to be too inefficient, so he needed to cancel it before it gave any results. The table had millions of rows, and the columns referenced in the conditions were not indexed.

So, they thought about subquery factoring to filter the table just once instead of doing it in the main query and the subquery, and they came up with something like this:

  1. WITH temp AS
  2.   (
  3.     SELECT *
  4.     FROM transactions
  5.     WHERE code <> 'something'
  6.       AND DATE > 'some date'
  7.       AND account LIKE 'something%'
  8.   )
  9. SELECT *
  10. FROM temp t
  11. WHERE EXISTS
  12.   (
  13.     SELECT Account
  14.     FROM temp
  15.     GROUP BY Account
  16.     HAVING COUNT(Account) > 4
  17.       AND Account = t.Account
  18.   );

This one performed the full table scan only once in the WITH clause, and then listed and grouped the filtered results in the main query.

This version gave results a little faster, but still took several seconds to return.

The results were correct, but the first guy wasn’t sure to understand how the EXISTS condition was affecting the results of the main query, so, the second guy started to explain to him how this worked, when it downed on him: This query was still very inefficient because the subquery in the EXISTS condition was a correlated one, so the grouping and counting was being performed once for each row in the potential result set from the main query!

They were still underutilizing the power of the WITH clause.

After they realized that, this is the query they ended up using:

  1. WITH temp AS
  2.   (
  3.     SELECT *
  4.     FROM transactions
  5.     WHERE code <> 'something'
  6.       AND DATE > 'some date'
  7.       AND account LIKE 'something%'
  8.   )
  9.   , temp2 AS
  10.   (
  11.     SELECT Account
  12.     FROM temp
  13.     GROUP BY Account
  14.     HAVING COUNT(Account) > 4
  15.   )
  16. SELECT *
  17. FROM temp t1
  18. JOIN temp2 t2
  19. ON t1.Account = t2.Account;

So, as you can see, in the first subquery they filter the table, which is done only once, and then in the second subquery they group and count the already filtered results obtained from the first one, which is done only once too, and then in the main query they simply join those temp resultsets.

This version returned its results immediately.

Sometimes one tends to do things using the first approach that comes to mind, even though it can many times not be the most efficient way to do it. Fortunately, sometimes performance is so unacceptable that it forces us to try some other method.

And in case you are wondering, yes, using analytic functions makes this task actually simpler and even more efficient, but I thought this story was a good way to show you how the WITH clause can help optimize queries in some situations:

  1. SELECT *
  2. FROM
  3.   (
  4.     SELECT t.*, COUNT ( * ) OVER (PARTITION BY t.account) account_count
  5.     FROM transactions t
  6.     WHERE code <> 'something'
  7.       AND DATE > 'some date'
  8.       AND account LIKE 'something%'
  9.   )
  10. WHERE account_count > 4;

Which for readability purposes could be written this way:

  1. WITH trans AS
  2.   (
  3.     SELECT t.*, COUNT ( * ) OVER (PARTITION BY t.account) account_count
  4.     FROM transactions t
  5.     WHERE code <> 'something'
  6.       AND DATE > 'some date'
  7.       AND account LIKE 'something%'
  8.   )
  9. SELECT *
  10. FROM trans
  11. WHERE account_count > 4;

In what other ways have you used the WITH clause for query optimization?

Share your wisdom in the discussion section below!

Opt In Image
Don't want to miss any Oracle SQL tips?

Subscribe to be informed about new posts, tips and more awesome things.

0 0 votes
Article Rating

Carlos

I've been working with Oracle databases on a daily basis for the last 18 years.

5 Comments
oldest
newest most voted
Inline Feedbacks
View all comments