I decided to write about the Pivot and Unpivot clauses, because I find them very useful and powerful.
In this article I will try to explain how they are used and show you some examples of the things you can do with them.
These are features that were added in version 11g of the database, and are used to transpose or rotate rows into columns and columns into rows very easily.
Converting Rows Into Columns (The Pivot Clause)
Let’s start by looking at the Pivot clause.
To show you why you might need to do this type of operation, I’m going to show you a very simple example.
This is how the employee table I’m going to use for the example looks like:
|7||CHARLES DEAN||08-JUN-1973 00:00:00||3||01-MAR-2002 00:00:00|
|8||EDDIE FARREL||28-JUL-1980 00:00:00||1||20-APR-2009 00:00:00|
|4||STUART GARCIA||12-APR-1965 00:00:00||4||15-FEB-2010 00:00:00|
|11||LUKE MINT||19-OCT-1985 00:00:00||4||08-JAN-2011 00:00:00|
|9||GEORGE HAYES||03-AUG-1982 00:00:00||2||22-SEP-2012 00:00:00|
|3||LUISA JACKSON||08-MAR-1970 00:00:00||3||29-AUG-2013 00:00:00|
|2||JAMES BOSH||15-FEB-1992 00:00:00||2||01-JUL-2014 00:00:00|
|12||NIGEL OAKS||05-NOV-1997 00:00:00||4||01-OCT-2014 00:00:00|
|10||IGOR KEYS OSBOURNE||11-SEP-1987 00:00:00||3||14-NOV-2014 00:00:00|
|1||JOHN SMITH||01-JAN-1995 00:00:00||1||28-MAR-2015 00:00:00|
|5||JUSTIN BLACK||16-MAY-1990 00:00:00||1||02-MAY-2015 00:00:00|
|6||ANGIE CROOD||22-JUN-1998 00:00:00||1||01-JUL-2015 00:00:00|
|13||LUKE GREEN JR||05-FEB-1995 00:00:00||4||01-SEP-2015 00:00:00|
Now suppose I am asked to create a query that tells me the number of employees that were hired in the years of 2014 and 2015 for each department. They want to see: The department id, and then a column for the number of employees hired in 2014 and another column for the number of employees hired in 2015.
How can I do that?
How would you do it?
I can’t do a simple count grouped by department, because it would count the number of employees of each department, regardless of the date in which they were hired.
So, how would you do it? Think about for a moment.
Somehow, I need to count employees, but as we discussed earlier, a simple count won’t work. So, in one column, I need to count employees only if they were hired in 2014, and in another column I need to count those who were hired in 2015.
Since there is a condition involved here, it sounds like a job for a CASE expression.
So, my query using COUNT and CASE expressions would be something like this:
SELECT department_id, COUNT( CASE TO_CHAR(hire_date, 'YYYY') WHEN '2014' THEN 1 END) AS "2014", COUNT( CASE TO_CHAR(hire_date, 'YYYY') WHEN '2015' THEN 1 END) AS "2015" FROM employee GROUP BY department_id ORDER BY department_id;
In this example, the CASE expressions will return NULL when the condition is not met, and since COUNT only counts rows where the column or expression passed as argument is not null, it will only count the rows where the CASE expression’s condition evaluated to true.
Now, there is an easier way to do it, that is available on version 11g or above, and it is by using the PIVOT clause.
Here is the equivalent query (we are going to review it together).
SELECT * FROM ( SELECT department_id, TO_CHAR(hire_date, 'YYYY') AS YEAR FROM employee ) PIVOT (COUNT(*) FOR YEAR IN (2014, 2015)) ORDER BY department_id;
This produces the same results than our previous query, but this query looks simpler, don’t you think?
Okay, let’s review it!
The code in line # 7 is what does the magic, and it is called the PIVOT clause, and some people also call it “the pivot operator”.
Here is a short explanation:
The PIVOT keyword is mandatory, and then, enclosed in parentheses, you have to provide an aggregate function, which in this case was COUNT(*). After the aggregate function it comes the FOR keyword and then it comes a column whose rows you want to rotate into separate columns. In this case the column I used is “YEAR”, so each row in this column will become a separate column in the final result set. Then it comes the IN keyword, and then between parentheses, the list of values for which I want separate columns. In this case, I want a column for 2014 and one for 2015.
There is an important limitation here, regarding the list of values I put inside the parentheses: This list is static, or in other words, I have to specify each one of the values for which I want a separate column. I cannot use a subquery here. Please keep that in mind.
So, in natural language, this is what I’m telling the database to do: I want you to look at the YEAR column and count how many rows have 2014 in that column, and how many rows have 2015, and return the counts in separate columns.
So, I needed an inline view because where I have the “YEAR” in the pivot clause, it is mandatory that you use a column. I cannot use the hire_date column because I don’t want to pivot on the complete hire_date. I only want to pivot on the year, so I need a function to extract the year from the date (I used TO_CHAR but I could have used EXTRACT as well), but functions are not allowed here, so I use the function in the inline view and provide an alias for the new column, and then in the main query I can reference it as a column called YEAR.
Another thing that is important to notice is that in my inline view I have the department_id column, and in my results I have also the department id column, so, every column that is not used in the PIVOT clause works as if I were grouping by that column. In this case, I have the count of employees grouped by department id.
What else can I do here?
Well, I can provide an alias for the aggregate function. I’m going to call it “EMPLOYEES_HIRED” to show you how this alias is used.
SELECT * FROM ( SELECT department_id, TO_CHAR(hire_date,'YYYY') AS YEAR FROM employee ) PIVOT (COUNT(*) AS EMPLOYEES_HIRED FOR YEAR IN (2014, 2015)) ORDER BY department_id;
As you can see, it added the alias to the name of each of the columns.
And I can also add an alias for the values inside the parentheses, for example:
SELECT * FROM ( SELECT department_id, TO_CHAR(hire_date,'YYYY') AS YEAR FROM employee ) PIVOT (COUNT(*) AS EMPLOYEES_HIRED FOR YEAR IN (2014 AS year2014 , 2015 AS year2015)) ORDER BY department_id;
And as you see, those aliases are also used to construct the final column names.
So, you can add aliases for the aggregate function and for the values in the IN sub-clause, but they are optional. If you don’t provide aliases, the values in the IN sub-clause become the names of the final columns.
Okay, what else?
Well, I can have more than one aggregate function, for example, if besides the count of employees for each year I want the SUM of the salary of the employees that were hired in that year, I can add a SUM on the SALARY too:
SELECT * FROM ( SELECT department_id, TO_CHAR(hire_date, 'YYYY') AS YEAR, salary FROM employee ) PIVOT ( COUNT( * ) AS employees_hired, SUM(salary) AS salaries FOR YEAR IN (2014 AS year2014, 2015 AS year2015)) ORDER BY department_id;
Pretty cool, right?
Okay, what else?
You can have combinations of values in the IN clause!
For example, if I want to count the employees that were born in 1995 and hired in 2015 and the count of those that were born in 1997 and hired in 2014, I can do something like this:
I’m going to make some changes and remove all the aliases to simplify it.
SELECT * FROM ( SELECT department_id, TO_CHAR(hire_date, 'YYYY') AS hire_year, TO_CHAR(birthdate, 'YYYY') as birth_year FROM employee ) PIVOT ( COUNT(*) FOR (birth_year,hire_year) IN ((1995,2015) , (1997,2014))) ORDER BY department_id;
As you can see, I have birth_year and hire_year in the FOR clause in line 10, and I have the combination of values I’m interested in, in the IN clause in line 11.
This is awesome…
Now, let’s talk about Unpivot.
Converting Columns Into Rows (The Unpivot Clause)
As you can imagine, Unpivot help us do the opposite operation of Pivot. With UNPIVOT we can rotate columns into rows.
I’m going to use a very simple table called DEPARTMENT to build an unpivot example.
Here is my DEPARTMENT table. It has an id and name for every department. It has also a column for the monthly budget and one for the ID of the last employee hired in that department:
Now, since the monthly budget and last employee id have both the same data type, which in this case is numeric data, I could use UNPIVOT to do something like this:
SELECT * FROM department UNPIVOT ( data_value FOR category IN (monthly_budget,last_employee_id) );
In my results I have a “category” column, and in that column I have “monthly budget” and “last employee id”, which are the titles of two separate columns in the department table, but now they are in the same column but in different rows, so I actually transposed columns into rows.
As you can see, now I have 2 rows for each department. In one row I have the monthly budget, and in the other row I have the last employee id, and in the column called DATA_VALUE I have the value that corresponds to the category I’m looking at, so, in the case of ACCOUNTING, in my original data I have 20,000 in the MONTHLY BUDGET column, and 8 in the LAST EMPLOYEE ID column, but now I have them in the same column, but in different rows.
The name of the category column was defined in line 5, in the FOR clause, and the name of the data_value column was defined in line 4.
There’s another thing to notice here: I have 2 rows for ACCOUNTING, 2 for MARKETING and 2 for IT and HUMAN RESOURCES, but I have only one for REGULATORY AFFAIRS and CUSTOMER SERVICE. Why is that?
If you look at the data in the department table again, you can see that for those departments I have NULL in the LAST_EMPLOYEE_ID column, and that is why I don’t have that category in my UNPIVOTed results. This is the default behavior, but if you want to get all of the rows, even if they have NULL in the original data, you just need to add “INCLUDE NULLS” in the UNPIVOT clause:
SELECT * FROM department UNPIVOT INCLUDE NULLS ( data_value FOR category IN (monthly_budget,last_employee_id) );
I modified the Unpivot clause in line 3, and as you can see, with this change now I have both categories for those departments.
Okay, to show you a slightly more complex example of the Unpivot clause, I’m going to create a table from some pivoted data from my employee table:
CREATE TABLE pivoted_employee AS SELECT department_id, COUNT( * ) as total_employees, SUM(salary) as total_salaries, COUNT( CASE WHEN salary < 3000 THEN 1 END) as employees_3000, SUM( CASE WHEN salary < 3000 THEN salary END) AS salaries_3000 FROM employee GROUP BY department_id;
As you can infer from the query, the total_employees column has the count of the employees for each department, and the total_salaries has the sum of the salaries of the employees in each department, and the other 2 columns have the count and sum of salaries only for employees whose salary is less than 3000.
Now, I want to unpivot this data, to get only one column for the counts and one for the sum of salaries, and another column that tells me the type of data that each row represents.
How can I do that?
Here is one way to do it:
SELECT * FROM pivoted_employee UNPIVOT ( (employees, salaries) FOR type IN ( (total_employees,total_salaries) as 'ALL EMPLOYEES', (employees_3000,salaries_3000) as 'EMPLOYEES WHO EARN LESS THAN 3000' ) ) ORDER BY department_id,type;
|1||EMPLOYEES WHO EARN LESS THAN 3000||2||4050|
|2||EMPLOYEES WHO EARN LESS THAN 3000||1||2500|
|3||EMPLOYEES WHO EARN LESS THAN 3000||1||2250|
|4||EMPLOYEES WHO EARN LESS THAN 3000||1||2000|
So, the "employees" and "salaries" that I have in line 4 of the query, are the columns that will have measure values in the final results. They will have the actual sums and counts from the pivoted_employee table. I called them employees and salaries, but they could be named however I like.
The "type" that I have in line 5 is the name of the column where the descriptors, or categories, or types will be returned, and it could also have any other name if I wanted to.
And the pairs of columns that I have in lines 7 and 8, define which data I want to include for each type, for example, in the row that will have the data for all of the employees, I want to include TOTAL_EMPLOYEES for the EMPLOYEES column in the result, and TOTAL_SALARIES for the SALARIES column in the result, and the string that I have after the "AS" keyword, is the actual text that will be displayed in the TYPE column for the row that includes data for all employees.
In a similar way, for the row that includes data only for employees who earn less than 3000, the type column will say 'EMPLOYEES WHO EARN LESS THAN 3000', and the EMPLOYEES column in the result will have the data from the EMPLOYEES_3000 column from my pivoted_employee table, and the salaries column will have data from the SALARIES_3000 column.
I Know! This is not very easy to memorize. I mean, remembering what the role of each part of the UNPIVOT clause is, is not something easy to do, but I guarantee you that it will become easier if you practice.
And to conclude the article, I just want to show you that you can combine PIVOT and UNPIVOT in the same query.
Take a look at this:
SELECT * FROM pivoted_employee UNPIVOT ( (employees, salaries) FOR type IN ( (total_employees,total_salaries) as 'ALL', (employees_3000,salaries_3000) as 'EARN LESS THAN 3000' ) ) PIVOT ( MAX(salaries) as sal, max(employees) as emp FOR department_id IN (1 as d1,2 as d2,3 as d3,4 as d4) );
|EARN LESS THAN 3000||4050||2||2500||1||2250||1||2000||1|
So, I used UNPIVOT to rotate some columns into different rows (line 3), and then I used PIVOT to rotate the different rows I had for each DEPARTMENT ID into columns (line 11), as you see in the final result. Now I have 2 columns for each department: One for the salaries and one for the employees. And I did all of this in the same query, which is pretty awesome!
Okay, that's it for this article. I hope it helped you at least a little bit to understand how to use these awesome features.
Now, go ahead and have some fun practicing them.
See you later!