37

Count(*) Vs Count(1) Again?

I decided to write this article because the question keeps being asked, and I usually find myself explaining my answer again and again and/or looking for links to someone else’s articles that can help support my answer.

Here  I discuss a couple of common misconceptions and share a short but irrefutable demonstration.

So, these are the question I will answer here:

What is the difference between COUNT(*) and COUNT(1)?

Short answer: None!

Is COUNT(1) more efficient or faster than COUNT(*)?

Short answer: No!

Is there a way to demonstrate or prove that they are the same thing?

Short answer: Yes!

What is the difference between COUNT(*) and COUNT(1)?

One of the most common answers I see for this question is that they might produce different results because COUNT(1) counts only rows in which the first column is not null.

Why is that answer incorrect?

Because COUNT receives an expression as parameter, not a column position.  COUNT(1) doesn’t mean COUNT(<first column>).  It means COUNT(1), with 1 being treated as a numeric literal.

The confusion comes most likely from the fact that you can use a column position in the ORDER BY clause, but using column positions is not allowed with the COUNT function.

Look at the syntax diagram for the ORDER BY CLAUSE:

Order By clause syntax diagram.

As you can see, it explicitly says that you can provide a position instead of an expression or an alias, but if you look at the syntax of the COUNT function, it expects an expression:

Count function syntax diagram.

So, what is the consequence of 1 being treated as a numeric literal by the COUNT function?

Well, it is true that COUNT counts only rows in which the expression passed is not null, but since 1 is a literal, and a literal doesn’t change, it will always be 1, for each and every row, and will never be null, so the final result is that COUNT(1) counts all of the rows returned by the query, regardless of the existence of nulls in any of the columns.

If you have doubts, try this query:

SELECT COUNT(99999), COUNT('MONKEY'), COUNT(*)
FROM DUAL;

If the parameter passed to COUNT was treated as a column position, then the first call to COUNT would produce an error, because there is only one column in the dual table.

And the second call to COUNT shows you that you can use any type of literal, and the result is the same, because, again, a literal will not change, and thus ‘MONKEY’ will never be null, so, COUNT(1), COUNT(99999) and COUNT(‘MONKEY’) are all equivalent to COUNT(*).

Is COUNT(1) more efficient or faster than COUNT(*)?

The most common argument in favor of COUNT(1) I have seen in this kind of discussion is that COUNT(*) needs to check the value of all columns in the row to determine if it needs to be counted, because COUNT doesn’t count nulls.

Why is that incorrect?

Because the official documentation about the COUNT function explicitly says this:

If you specify the asterisk (*), then this function returns all rows, including duplicates and nulls.

So, COUNT(*) always counts all rows, and thus, the database doesn’t need to check the column values, because it will always count all rows, regardless of their contents.

Is there a way to demonstrate or prove that they are the same thing?

I have seen some people use execution plans to demonstrate that COUNT(1) is equivalent to COUNT(*), but unfortunately, an execution plan comparison is not enough because even if they were different, and one of them actually did more work than the other one, the execution plans could still be equal, so that is not a convincing or irrefutable demonstration.

If you want irrefutable proof, here it is, and if you really want to find the truth,  this should be enough to make you take the correct side of this debate, forever 🙂

The proof:

There is a part of the database software that is called “The Optimizer”, which is defined in the official documentation as “Built-in database software that determines the most efficient way to execute a SQL statement“.

One of the components of the optimizer is called “the transformer”, whose role is to determine whether it is advantageous to rewrite the original SQL statement into a semantically equivalent SQL statement that could be more efficient.

The optimizer does a lot of work under the hood, but we usually don’t notice it.

Would you like to see what the optimizer does when you write a query using COUNT(1)?

I’m going to show you how to generate an optimizer trace, also known as a 10053 trace, in which you will be able to see a log of the optimizer work.  There are more methods to do it, but here is one:

You will have to use a database user to which the ALTER SESSION privilege has been granted.

I’m doing this on a SQL*Plus session, but if you are using SQL Developer, you might need to manually disconnect the session instead of executing the EXIT command in the last step.

First, you need to do something to make your trace file easy to identify:

ALTER session SET tracefile_identifier = 'My_count_test';

Then you have to enable the optimizer tracing:

ALTER session SET events 'trace [SQL_Optimizer.*]';

Then you have to run the command you want to trace, which in this case is a SELECT statement that uses COUNT(1):

SELECT /* test-1 */ COUNT(1)
FROM employees;

For the trace to be generated, a “hard parse” of the statement needs to occur, and in simplified words, it will occur if the exact same statement has not been executed before, so if you want to run and trace the same statement several times, make sure to add a comment that makes it different from the other times you have run the same statement.  In this case, if I want to trace the same statement again, I would change the comment to ‘test-2’, for example.

Ok, now, you need to exit the session for the trace to be written to the file:

EXIT;

As I mentioned before, if you are using SQL Developer, you might need to manually disconnect the session, as just executing EXIT or DISCONNECT doesn’t appear to really end the session.

Here is how this looks like on my system:

And now, the exciting part!  We are going to take a look at the contents of the trace we generated, but where is this file located?

It might vary from system to system, but you can fire this query to see the path where the trace files are located:

SELECT VALUE FROM V$DIAG_INFO WHERE NAME = 'Diag Trace';

Ok, here are the relevant portions of the 1850-lines trace file that was generated.

******************************************
—– Current SQL Statement for this session (sql_id=4nbgdngzf4024) —–
SELECT /* test-1 */ COUNT(1) FROM employees
*******************************************
Legend
The following abbreviations are used by optimizer trace.

CNT – count(col) to count(*) transformation

As you can see, there is a “count(col) to count(*)” transformation, that is represented in the trace as CNT.

A little later in the trace, there is this:

CNT:   Considering count(col) to count(*) on query block SEL$1 (#0)
*************************
Count(col) to Count(*) (CNT)
*************************
CNT:     COUNT() to COUNT(*) done.

And a little later, there is this:

Final query after transformations:******* UNPARSED QUERY IS *******
SELECT COUNT(*) “COUNT(1)” FROM “COURSE”.”EMPLOYEES” “EMPLOYEES”

Do you see the role “COUNT(1)” plays in the final query?

I will put it here again, as an image:

“COUNT(1)” is just an alias!  What Oracle actually runs is COUNT(*) but it returns COUNT(1) as the column title in the results, because that is what you asked for, but it actually runs COUNT(*)!

So, NO, COUNT(1) is not more efficient nor faster than COUNT(*), because COUNT(1) is actually never run.  It is always transformed into COUNT(*), so you always do COUNT(*) even if you don’t want it.

I could even say that COUNT(1) is at least a little bit less efficient, because it requires the optimizer to do a transformation that would not be needed if COUNT(*) was used from the beginning.

Are we on the same side now?

Great!

Have something to say?

Great!  Post your comments below.

Carlos

I've been working with Oracle databases on a daily basis for more than 10 years.

37 Comments

    • Hi, Trung.

      I can confirm the Oracle optimizer performs this transformation, but to be honest, I don’t know if other databases do something similar.

      It should be researched individually for each RDBMS.

  1. Also I thought that they are equivalent, but I had no evidence for it.
    Thank you for the deep and accurate deduction!

  2. Excellent explanation. When i saw this question, answer pop’ed in my mind that * vs (1) is different but ORACLE is different for sure. I have worked with Teradata DB and it has different syntax obviously and (1) – represents First col and not as literal – it avoids dups & NULL and (*) entire col which includes NULL & dups. However getting to know it works different in ORACLE is a good learner for me. Your other videos are awesome and so easy to understand

  3. [* Shield plugin marked this comment as “0”. Reason: Google reCAPTCHA was not submitted. *]
    There are some issues about dependencies tracking for version less than 11. Only starting from Oracle 11g was introduced Fine-Grained Dependencies tracking mechanism which allow to avoid such invalidation:

    SQL> select * from v$version where rownum = 1;

    BANNER
    —————————————————————-
    Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 – 64bi

    SQL> create table t_count (id number(*,0));

    Table T_COUNT created.

    SQL> create or replace procedure p_count
    is
    v_count pls_integer;
    begin
    select count(*)
    into v_count
    from t_count;
    end;

    Procedure P_COUNT compiled
    SQL> select object_name, object_type, status from all_objects where object_name in (‘T_COUNT’,’P_COUNT’);

    OBJECT_NAME OBJECT_TYPE STATUS
    —————————— ——————- ——-
    P_COUNT PROCEDURE VALID
    T_COUNT TABLE VALID

    SQL> alter table t_count add (val varchar2(1));

    Table T_COUNT altered.

    SQL> select object_name, object_type, status from all_objects where object_name in (‘T_COUNT’,’P_COUNT’);

    OBJECT_NAME OBJECT_TYPE STATUS
    —————————— ——————- ——-
    P_COUNT PROCEDURE INVALID
    T_COUNT TABLE VALID

      • [* Shield plugin marked this comment as “0”. Reason: Google reCAPTCHA was not submitted. *]
        Sorry for bad formatting.

        I just wanted to say that probably COUT(1) is more safety than COUNT(*) for releases less than Oracle 11g. Because if you make some changes into table upon which you use COUNT(*) it will invalidate subprograms (procedures, functions etc.) in which it is used:

        SQL> select * from v$version where rownum = 1;
        BANNER
        —————————————————————-
        Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 – 64bi

        SQL> create table t_count (id number(*,0));
        Table T_COUNT created.

        SQL> create or replace procedure p_count
        is
        v_count pls_integer;
        begin
        select count(*)
        into v_count
        from t_count;
        end;
        /
        Procedure P_COUNT compiled

        SQL> select object_name, object_type, status from all_objects where object_name in (‘T_COUNT’, ‘P_COUNT’);
        OBJECT_NAME OBJECT_TYPE STATUS
        —————————— ——————- ——-
        P_COUNT PROCEDURE VALID
        T_COUNT TABLE VALID

        SQL> alter table t_count add (val varchar2(1));
        Table T_COUNT altered.

        SQL> select object_name, object_type, status from all_objects where object_name in (‘T_COUNT’, ‘P_COUNT’);
        OBJECT_NAME OBJECT_TYPE STATUS
        —————————— ——————- ——-
        P_COUNT PROCEDURE INVALID
        T_COUNT TABLE VALID

        • Ah, I see it now, thanks for the clarification.

          That is an interesting observation. So, that doesn’t happen when the procedure calls COUNT(1) instead of COUNT(*)? (I don’t have a 10g instance to test it).

          • Count(1) has been rewritten in count(*) since 7.3 because Oracle like to Auto-tune mythic statements. In earlier Oracle7, oracle had to evaluate (1) for each row, as a function, before DETERMINISTIC and NON-DETERMINISTIC exist.

            So two decades ago, count(*) was faster

            The count(1) myth is ancient… really ancient. And untrue

          • Agreed.

            I’m surprised how often the question gets asked in spite of how old the myth is.

  4. Great article, Carlos. I always knew count(*) was faster than count(1). However, if someone asked me to prove I was never able to :)… now I have proof …

    Thank you again !

  5. Hi, thanks for the explanation. It begs the question though, if they are the same why have both? Are there circumstances when one is preferable?

    • Hi, Mike.

      We don’t really need both. For every situation in which we want to count all of the rows, we can use COUNT(*). The possibility to use COUNT(constant) exists just because the type of expression that the function expects as argument includes literals.

      My opinion is that COUNT(*) is always preferable.

  6. What do you mean by FOR EACH in the highlighted part given in [[[[ ]]]]] ? Do you mean that FOR EACH COLUMN And EVERY ROW ?

    Well, it is true that COUNT counts only rows in which the expression passed is not null, but since 1 is a literal, and a literal doesn’t change, it will always be 1, [[[[[[[[[[[[[[for each and every row]]]]]]]]]]]]]]]]]]]]]], and will never be null, so the final result is that COUNT(1) counts all of the rows returned by the query, regardless of the existence of nulls in any of the columns.

    • Hi, Vishal!
      I meant “for each row”, but used that phrase “for each and every row” just to add more emphasis. I was not referring to any column from the table, but to the “1” that was being passed as a literal to the COUNT function.

      Does it make sense?

  7. Just what I’ve been looking for!

    Just one more question sir, is Count(1) safe even when using joins? I’m still new on SQL. Many thanks.

  8. I just want to state I am beginner to blog and really enjoyed you’re site . Likely I’m want to bookmark your website. You certainly feature great posts. Thanks a lot for revealing your blog website.
    0597_52

  9. Carlos, muchas gracias por la explicacion. Pregunta, donde puedo encontrar mas info acerca de los diagramas de sintaxis que pones, de como se ejecuta? me cuesta trabajo aun analizarlos, supongo que es falta de conceptos..

    • Pues la verdad no he visto que haya mas explicación en ningún lugar. Solo está en la documentación oficial, y creo que no están disponibles en español.

      En general, lo que ayuda es verlos tratando de seguir los posibles caminos que las flechas indican. Cuando hay mas de un camino posible, es porque cualquiera de los dos (o más) caminos sería una sintaxis válida.

Leave a Reply

Your email address will not be published. Required fields are marked *