3

How Important is to Format your SQL Code?

A few days ago, I was quickly scanning my Twitter feed, and I saw a poll from a relatively popular Twitter account, asking people how they format their SQL code or how they apply a specific SQL code format.

After a couple of days, I looked it up again to see the final results, and I was very surprised when I saw this:

 

Twitter poll on SQL code formatting.

 

Granted, only about 600 people responded, but still, I couldn’t believe that 60% of the responders format their code manually (and I know that many of the followers of that account are people who work with Oracle databases).

You know that SQL Developer has a nice auto-formatter, right?

I’m going to show you the kind of things the auto-formatter can do to help you with your SQL code format, but first, I want to discuss the topic that really got me thinking after seeing the poll results:

What if instead of asking how people format their SQL code, we asked how many people format their SQL code consistently?

I think that is a very important question, and I want to focus on the reasons why everyone should format their SQL code first, to then conclude the article with a short demonstration of how SQL Developer can help you do it, so, let’s get started.

Why is Your SQL Code Format Important?

As you know, SQL code can be written in many different ways because the language is very flexible in that sense.

We can write keywords, and in most cases, also identifiers, in upper case, lower case, or mixed case. Furthermore, we can write a complete statement in a single line or separate it into several lines, add additional indentation, etcetera.

But, even though writing a statement like this is valid:

SELECT EMPLOYEE_ID,FIRST_NAME,LAST_NAME,SALARY FROM EMPLOYEES WHERE SALARY>5000 AND DEPARTMENT_ID !=60;

I think you will agree that this other version of the same query, is a lot easier to read and understand, and in consequence, also easier to maintain:

SELECT employee_id
     , first_name
     , last_name
     , salary
FROM employees
WHERE salary > 5000
      AND department_id != 60;

Here, if you have to join additional tables, it is very easy to see where the FROM clause is, or if you want to add additional conditions, the WHERE clause is also very easy to spot.

Formatting your code can have a huge impact on how easy it is to read and maintain.

So, yes, one of the big reasons to format your code is related to readability, but it is not the only one!

Not formatting your code in a consistent manner can have a negative impact on performance as well.

For this to make sense, you need to have at least a basic understanding of how SQL statements are processed by the database, so, I’m going to give you a short, and a little over-simplified, explanation about it.

How SQL Statements are Processed

When you tell Oracle to execute a statement by hitting Ctrl + Enter in SQL Developer, there is a series of “stages” that the statement needs to go through, until it is actually executed. Those stages are shown in this image:SQL processing stages diagram

First, the syntax of the statement is checked to see if it is correct. For example, if you write a query without a FROM clause, the syntax check fails, and the execution process is stopped.

Next, it must pass a semantic check to determine if the statement is meaningful. Here, some other things that are not related to the syntax are checked. For example, you can write a perfectly valid query from the syntax perspective, but if it mentions a column or table that doesn’t exist, it will make the semantic check fail.

Then, if the statement passes both, the syntax and semantic checks, a “shared pool” check is performed, and this is the part that is more important for the point I’m trying to make.

This check is performed to determine whether some of the next stages of the process can be skipped or avoided, and it works essentially this way:

The database uses an algorithm to generate a “hash” value that is based on the text of the statement, including blank spaces, and even comments. This value is known as the “SQL ID”, and as you can imagine, a small difference, like a single additional space, in the text of the statement can produce a different hash value or SQL ID.

So, the database calculates the SQL ID of the statement and searches for it in an area of memory that is called the “library cache” which is part of a bigger area called the “shared pool“.

If the statement is found in the shared pool, and the database determines that the data and code stored there can be reused, it can skip the optimization and row source generation steps and go directly to the execution.

If it is not found, then the statement must go through those stages, which can be very resource-intensive. I won’t go into the details of the things that happen in those stages, but for the purpose of this article, suffice it to say that the things that happen here can take a considerable portion of the execution time of the statement.

When a statement needs to go through these steps, because code that could be re-used to execute it was not found, we say that a “hard parse” was performed, and when existing code could be re-used, so a hard parse is not necessary, we call it a “soft parse“.

Hard parsing a statement is a very resource-intensive process that uses CPU and memory, which can impact the performance of the whole system, in addition to making the execution of the current statement slower.

As the official docs say:

“Effectively, a hard parse recompiles a statement before running it. Hard parsing a SQL statement before every execution is analogous to recompiling a C program before every execution.”

So?

How Does Code Format Impact Performance?

As we can infer from the above section, a hard parse is usually needed only the first time a statement is executed.

If you have the same query (at least the same from your perspective, because you know it produces the same results) written in many different ways across your applications, even if the only difference is the format you applied, the database will see each of those queries as a different one and will have to perform a hard parse the first time that each one of them is executed.

So, you would be making the database do additional and unnecessary work, and you would also make it use more memory because there will be a memory area occupied by the data and code used by each one of the different versions of the statement, just because you didn’t care to format your code in the same way.

Some people might argue that this is not significant enough, but, even if the effect was negligible (which I don’t think it is), and it doesn’t completely apply to SQL code embedded in PL/SQL (see below), why wouldn’t you want to format your code consistently and make that a common practice in your organization, when it is so easy to do?

Note about SQL code embedded in PL/SQL:

The performance impact that comes from differences in formatting (additional blank spaces, text case, line breaks, etc.) is much more likely to occur when the SQL code is executed from client applications directly, because static SQL code that is embedded in PL/SQL is re-formatted by the PL/SQL compiler, and that re-generated SQL code is what will be passed over to the SQL engine. It removes redundant blank spaces, line breaks, etc…, standardizes letter case, and even removes normal comments (i.e. comments not written in the hint syntax (/*+ … */)) with the goal of avoiding unnecessary hard parses that would be required only because of formatting differences.

BUT… It doesn’t actually remove all unnecessary blank spaces. If you run the following code in the HR test schema, it will cause two different queries to be stored in the shared pool (and both of them will require a hard parse), even though they are basically the same and are embedded in PL/SQL, just because the first one has blanks around the equal sign but the second one doesn’t. If you add additional spaces, they won’t make a difference, but one vs. no space does make a difference, and it is still just a formatting difference.

DECLARE
    l_empid employees.employee_id%TYPE;
    l_salary employees.salary%TYPE;
BEGIN
    SELECT  employee_id, salary
        INTO  l_empid, l_salary
    FROM employees
    WHERE employee_id = 200;
 
    SELECT employee_id, salary
        INTO  l_empid, l_salary
    FROM employees
    WHERE employee_id=200;
END;
/

So, no, SQL code embedded in PL/SQL is not completely free from the performance impact that could be caused by formatting differences.

How SQL Developer Can Help

SQL Developer is a very powerful tool, but since it makes it too easy to create a connection to your database and start firing SQL commands right away, many people don’t feel the need to explore some of its many useful features, and the auto-formatter is one of them.

To make it format your code, you don’t even need to use the mouse. Just press CTRL + F7, and it will do its magic.

Go ahead, try it out right now!

I know. The formatting rules it applies out of the box might not be what you want, but you can customize it by tweaking the preferences.

You just need to go to the “Tools” menu, and then, “Code Editor” -> “Format” -> “Advanced Format” (This will differ if you have SQL Developer in a different language. Check here for instructions on how to change it).

 

SQL Developer formatting preferences

 

There you have options to change general things, alignment, indentation, line breaks, and white space rules.

You can go from this:

SELECT employee_id, first_name, last_name, salary FROM employees WHERE salary>5000 AND department_id !=90;

To this:

SELECT employee_id
     , first_name
     , last_name
     , salary
FROM employees
WHERE salary > 5000
      AND department_id != 90;

By simply pressing a couple of keys on your keyboard.

And if you want to customize it even further, you can go to “Custom Format” and do lower-level customization, but to do that you would need to know the Arbori language, which is out of the scope of this discussion (I will probably write about it one day).

But besides allowing you to tweak the preferences to format code in a way you like, there’s something else SQL Developer can do:

Did you notice the “Export” button I enclosed in a light-blue rectangle in the image above?

It allows you to export your formatter preferences to an XML file that you can share with your co-workers. They just need to use the “Import” button to import the formatter styles file, and in a matter of seconds, they can start formatting their SQL code using exactly the same rules you use.

This makes it very easy to format your SQL code consistently across all your systems.

Closing Words

Formatting your SQL code consistently (either manually or with the help of a tool) is considered a good practice, and the adoption of good practices is one of the things that differentiate professionals from amateurs.

My recommendation for you today is to get to know the auto-formatter options of the tool you use to write your code and then, go ahead and actually define, or at least, start a discussion with your team to define, formatting rules that you will later use across the board in all your SQL or PL/SQL code.

SQL Best Practices

The Standout-Dev Academy is launching an Oracle SQL Best Practices video course.

If that sounds like something you would be interested in, click the link below to learn more.

4.8 6 votes
Article Rating

Carlos

I've been working with Oracle databases on a daily basis for the last 18 years.

3 Comments
oldest
newest most voted
Inline Feedbacks
View all comments