5th Jan 2023 7 minutes read UNION and UNION ALL in SQL Explained Nicole Darnley sql union Table of Contents UNION Example #1 - UNION Renaming Columns in UNION Example #2 - Using the Same Number of Columns in UNION Example #3 - UNION ALL UNION vs. UNION ALL – Choosing Which to Use In this article, we’ll explore the SQL clauses UNION and UNION ALL. Learn what they do and when to use each one. I had been an analyst for probably four years before I knew about the SQL clause UNION ALL. I used UNION on a regular basis, but had never heard of its counterpart. Then one day when I heard a coworker talking about it. I thought to myself, “What is he talking about? Is this something new?” So I did what any good analyst would do and googled it. My heart sank. I had probably, on more than one occasion, used the incorrect clause in my analyses. After college, I taught myself SQL. The result was an incomplete understanding of the syntax I was using. As you are learning, I highly recommend taking formal courses so you can be confident you have a full understanding of the queries you are writing. A great way to do this is by taking the SQL Basics course. Don’t make the same mistakes I did early on in my career. Take classes from professionals and gain a solid foundational knowledge of SQL. I’m getting a little ahead of myself. Let’s take a step back and talk about what UNION and UNION ALL do in SQL. UNION There will be times when an analyst needs to combine the results of multiple queries. This might be because the data lives in different tables or maybe you have existing queries that you now need to make into one. Whatever the reason, the most straightforward solution is to use UNION. Let’s illustrate with an example. Example #1 - UNION In this example, we’ll imagine that we are working on analyzing some website data. One table we will be working with is called button_clicks. This table contains all the click events that we are tracking on our website. It looks like this: user_idtimestamplabel 1364782022-12-01 09:15:34continue 1364782022-12-01 09:15:34continue 1827362022-12-01 09:17:03submit 2736472022-12-01 09:18:00back 3475892022-12-01 09:21:48continue We will also be working with a table called: navigation_clicks. This table contains all click events that occur on the website’s navigation menu. It looks like this: user_idtimestampnavigation_link 1364782022-12-01 09:10:15home 1364782022-12-01 09:12:12about_us 1827362022-12-01 09:17:59store 2736472022-12-01 09:21:30home 3475892022-12-01 09:18:17blog Let’s say our manager comes to us and says they need a list of all the button clicks on the website. That is easy enough. We’ll simply run: SELECT * FROM button_clicks However, they come back to us a few minutes later and ask us to include the navigation clicks in our report. Seems simple enough, but how do we do this? By using the UNION clause. To combine the results of these two queries together, we’ll run: SELECT * FROM button_clicks UNION SELECT * FROM navigation_clicks As you can see, the UNION clause goes between the two SELECT queries. The query on either side of the UNION clause should run as a standalone query. This query will return the following: user_idtimestamplabel 1364782022-12-01 09:15:34continue 1827362022-12-01 09:17:03submit 2736472022-12-01 09:18:00back 3475892022-12-01 09:21:48continue 1364782022-12-01 09:10:15home 1364782022-12-01 09:12:12about_us 1827362022-12-01 09:17:59store 2736472022-12-01 09:21:30home 3475892022-12-01 09:18:17blog What we’ve done is taken the results of two separate queries and placed them one on top of the other. We see that the first 4 rows contain the results of the first query and the next 5 rows contain the results of the second query. But wait…our tables combined contain 10 rows, so why did we only return 9? This is because UNION will remove duplicates from the results. The first two rows in the table button_clicks are the exact same, so our UNION has only returned one of them. Renaming Columns in UNION In the result set above, notice the column names. In the table button_clicks, the third column is label, but in the table navigation_clicks the third column is navigation_label. When we use UNION, the column names will come from the column names of the first query, which is what we see above. If you would like, you can always add aliases to the columns to rename them to your liking. For example, if we want to change the third column name to button_or_navigation_label, we should run: SELECT user_id, timestamp, label AS button_or_navigation_label FROM button_clicks UNION SELECT user_id, timestamp, navigation_label AS button_or_navigation_label FROM navigation_clicks Example #2 - Using the Same Number of Columns in UNION Let’s walk through another example using UNION. First, we’ll make a slight modification to our navigation_clicks table. We are going to add one more column so that our table now looks like this: user_idtimestampnavigation_linknavigation_level 1364782022-12-01 09:10:15homemain 1364782022-12-01 09:12:12about_ussub 1827362022-12-01 09:17:59storesub 2736472022-12-01 09:21:30homemain 3475892022-12-01 09:18:17blogsub If we try to run our original UNION query again, we will get an error that reads Each UNION query must have the same number of columns. This is because we are selecting all table columns from each table, but the number of columns in the tables is now different. To fix this, we will need to explicitly name the columns we wish to return. In addition, the number of columns returned by each query must be the same and the type should match. If the third column in the first query is a string, the third column in the other query must also be a string. To avoid any errors, we’ll run: SELECT user_id, timestamp, label FROM button_clicks UNION SELECT user_id, timestamp, navigation_label FROM navigation_clicks This will return the same results as our first UNION example. Example #3 - UNION ALL Now that we understand UNION, let’s walk through an example using UNION ALL. As mentioned earlier, UNION removes duplicates from our query results. UNION ALL will not do this. Let’s continue to build on our previous example. Our manager has come to us asking for a report that contains all button and navigation clicks on the website. We then confirmed that the report should include duplicates. Since you already know how to use UNION, can you guess how we would do this? That’s right – we’ll just replace UNION with UNION ALL: SELECT user_id, timestamp, label FROM button_clicks UNION ALL SELECT user_id, timestamp, navigation_label FROM navigation_clicks user_idtimestamplabel 1364782022-12-01 09:15:34continue 1364782022-12-01 09:15:34continue 1827362022-12-01 09:17:03submit 2736472022-12-01 09:18:00back 3475892022-12-01 09:21:48continue 1364782022-12-01 09:10:15home 1364782022-12-01 09:12:12about_us 1827362022-12-01 09:17:59store 2736472022-12-01 09:21:30home 3475892022-12-01 09:18:17blog There are all 10 rows. As you can see, UNION ALL has preserved all the results, even if there are duplicates. Our examples have only included one UNION or UNION ALL statement, but there is no limit to the number of queries you can union together. Just for one last example, let’s say we also wanted to include a table called form_clicks. We’re not worried about duplicates, so we’ll continue using UNION ALL: SELECT user_id, timestamp, label FROM button_clicks UNION ALL SELECT user_id, timestamp, navigation_label FROM navigation_clicks UNION ALL SELECT user_id, timestamp, form_label FROM form_clicks As you can see, it’s the exact same syntax; we’re just continuing to add to our existing query. If you’re interested in learning more about UNION ALL, be sure to give our SQL Union All article a read! UNION vs. UNION ALL – Choosing Which to Use It’s important to understand that a query using UNION ALL is going to run faster than if you were to use UNION. This is because the UNION clause is sorting through the data and removing any duplicates. Depending on how much data you are querying, this could add a decent amount of run time to your query. When you’re deciding which clause to use, it’s as simple as asking yourself the following question: Do I need to remove duplicate values? If yes, use UNION. If no (or if it doesn’t matter), use UNION ALL. That was a lot, so let’s take some time to review! Both UNION and UNION ALL are clauses used to combine multiple queries into one result set. UNION will remove duplicates, while UNION ALL will not. UNION ALL runs faster because of this. Here is another great article on SQL Set Operations to help reinforce and expand on what you’ve just learned. Nice job! You’ve now got a firm understanding of both UNION and UNION ALL and can make an informed decision on which to use in your analyses. You’re already several steps ahead of where I was when I was learning! Be sure to check out our SQL Basics course and continue your journey to becoming an expert SQL writer. Tags: sql union