18th Aug 2022 12 minutes read An Overview of SQL Text Functions in Google BigQuery Tihomir Babic sql SQL text functions Google BigQuery Table of Contents Do You Know What a Function Is? Data for the Examples That Use BigQuery Text Functions Google BigQuery Text Functions CONCAT() CHAR_LENGTH(), CHARACTER_LENGTH(), and BYTE_LENGTH() UPPER(), LOWER(), and INITCAP() SUBSTR() and SUBSTRING() REPLACE() TRIM() REVERSE() STRPOS() SPLIT() Aren't These SQL Text Functions Useful? There Are More! Data analysts and Google BigQuery go hand in hand. Google's cloud data warehouse presents you with plenty of opportunities for using standard SQL text functions you can't avoid in your day-to-day work as a data analyst. BigQuery uses Google Standard SQL, an ANSI-compliant SQL dialect. This means you can use standard SQL text functions in BigQuery without needing to learn a variant of a given function. The Standard SQL Functions course is an excellent resource for learning those functions. As a prerequisite, you need to know SELECT, WHERE, JOINs, GROUP BY, and HAVING. If you do, the course teaches you how to work with text, numeric, and date and time functions. It also covers specific situations, such as dealing with NULL values, using aggregate functions, and writing CASE statements. To make sure you have enough examples to practice code writing, the course offers 211 interactive exercises. Do You Know What a Function Is? You probably have an idea of what functions are in SQL. As a reminder, an SQL function is a named predefined piece of code you can use in your SQL query. If you use a function in a SELECT statement, this is what the syntax looks like: SELECT BIGQUERY_FUNCTION(argument1, argument 2, …) FROM table In this example, the function is called BIGQUERY_FUNCTION. Every function is accompanied by parentheses that hold the arguments. Arguments tell a function what to do and how to do it. Data analysts deal with text, not just numeric values. This is why you need text functions in SQL to write effective queries. So, which popular Google BigQuery text functions do you need to know? Find out below where I explain each of them. The examples are all based on the same sample data. Data for the Examples That Use BigQuery Text Functions One table is enough for you to see how these text functions work. All the data is stored in the table orders. The columns are: id – the ID of the record. order_id – the ID of the order. customer – the customer ID. shipping_address – the customer's address for delivering the order. email – the customer's email address. Here's the data, so that everything is out in the open. idorder_idcustomershipping_addressemail 1&&&4592022070101&&&Jones & Son, Inc.10-2 Parkson St, Boston, MA 02136, USAs.jones@jones.com 2&&&4592022070102&&&Jones & Son, Inc.10-2 Parkson St, Boston, MA 02136, USAs.jones@jones.com 3&&&4922022070101&&&Stavros GmbHLandsberger Alee 49, Berlin, 10249, Germanyjasmin.s@stavros.com 4&&&4922022071201ć&&Hendricks, Hendricks, HoekstraAnna Paulownastraat 23, Rotterdam, 3014 JA, Netherlandscindy@hhh.com 5&&&4822022071501&&&Mckenna & Partners2-8 Ingram St, Glasgow, G1 1HA, UKsean.m@mckenna.com Let's not waste any time – let's go directly to the examples. Google BigQuery Text Functions CONCAT() This function concatenates multiple strings into one. In other words, it "glues" together two or more text values. There are two ways to do this: with the CONCAT() function or with the || operator. Let's see how each of these works in the following example. SELECT customer, shipping_address, CONCAT(customer, ', ', shipping_address) AS full_address_concat, customer|| ', ' || shipping_address AS full_address_operator FROM `orders.orders`; First, I select the customer and his/her address. I want to show the name and the address in one column, which is the full address where the order will be sent. To achieve this, I use the CONCAT() function, and the columns customer and shipping_address are the arguments for the function. Also, I want the name and the address to be separated with a comma and a blank space after it, so this is also a function argument written in single quotes. The || operator works similarly. The logic is that you put a || between every pair of strings you want to "glue" together. Compared to CONCAT(), this method does not call for a specific function, and the "arguments" are separated by a || instead of a comma. Here's the code output. customershipping_addressfull_address_concatfull_address_operator Jones & Son, Inc.10-2 Parkson St, Boston, MA 02136, USAJones & Son, Inc. , 10-2 Parkson St, Boston, MA 02136, USAJones & Son, Inc. , 10-2 Parkson St, Boston, MA 02136, USA Jones & Son, Inc.10-2 Parkson St, Boston, MA 02136, USAJones & Son, Inc. , 10-2 Parkson St, Boston, MA 02136, USAJones & Son, Inc. , 10-2 Parkson St, Boston, MA 02136, USA Stavros GmbHLandsberger Alee 49, 10249, Berlin, GermanyStavros GmbH, Landsberger Alee 49, 10249, Berlin, GermanyStavros GmbH, Landsberger Alee 49, 10249, Berlin, Germany Hendricks, Hendricks, HoekstraAnna Paulownastraat, 3014 JA, Rotterdam, NetherlandsHendricks, Hendricks, Hoekstra, Anna Paulownastraat, 3014 JA, Rotterdam, NetherlandsHendricks, Hendricks, Hoekstra, Anna Paulownastraat, 3014 JA, Rotterdam, Netherlands Mckenna & Partners2-8 Ingram St, Glasgow G1 1HA, UKMckenna & Partners, 2-8 Ingram St, Glasgow G1 1HA, UKMckenna & Partners, 2-8 Ingram St, Glasgow G1 1HA, UK As you see, both ways of concatenating strings give the same result. CHAR_LENGTH(), CHARACTER_LENGTH(), and BYTE_LENGTH() The CHAR_LENGTH and CHARACTER_LENGTH() functions are the same: they both return the length of a string in numbers of characters. BYTE_LENGTH(), as you may have imagined, does the same but in numbers of bytes rather than characters. Let's apply all three functions to the order ID. SELECT order_id, CHAR_LENGTH(order_id) AS order_id_CHAR, CHARACTER_LENGTH(order_id) AS order_id_CHARACTER, BYTE_LENGTH(order_id) AS order_id_BYTE FROM `orders.orders`; The syntax here is simple: call the function and write the column name. Here are the text lengths. order_idorder_id_CHARorder_id_CHARACTERorder_id_BYTE &&&4592022070101&&&191919 &&&4592022070102&&&191919 &&&4922022070101&&&191919 &&&4922022071201ć&&191920 &&&4822022071501&&&191919 The functions CHAR_LENGTH() and CHARACTER_LENGTH() both return the same result. As I have mentioned, BYTE_LENGTH() measures the length in bytes. The length in bytes is different from the character length if there are Unicode characters. That's why it shows a length of 20 bytes in the fourth row. The character "ć", a Unicode character, somehow got into the order ID. UPPER(), LOWER(), and INITCAP() The UPPER() function turns all letters of a string into uppercase. LOWER() does the opposite; all the letters become lowercase. INITCAP() is something in between. The first letter of each word becomes uppercase, and all other letters become lowercase. Also, you can specify a delimiter. If you do so, then the first letter of each word after the delimiter becomes uppercase, and all others become lowercase. Here's an example of these functions. SELECT customer, UPPER(customer) AS customer_upper, LOWER(customer) AS customer_lower, INITCAP(customer, ',') AS customer_initial_upper FROM `orders.orders`; Again, the syntax is very simple: specify the column in the parentheses after the function, and you get the desired result. For INITCAP(), the result is different only if you specify the delimiter, which I do here. My delimiter is the comma. The code returns the following result. customercustomer_uppercustomer_lowercustomer_initial_upper Jones & Son, Inc.JONES & SON, INC.jones & son, inc.Jones & son, inc. Jones & Son, Inc.JONES & SON, INC.jones & son, inc.Jones & son, inc. Stavros GmbHSTAVROS GMBHstavros gmbhStavros gmbh Hendricks, Hendricks, HoekstraHENDRICKS, HENDRICKS, HOEKSTRAhendricks, hendricks, hoekstraHendricks, hendricks, hoekstra Mckenna & PartnersMCKENNA & PARTNERSmckenna & partnersMckenna & partners All three functions work as described earlier. However, for the result of INITCAP(), you see the first letters after commas are still lowercase. Shouldn't it have been uppercased since we specified the comma as the delimiter? Well, no, not in this case. This is because the first text values after the comma are spaces. For this reason, you do not see the effect of INITCAP(). SUBSTR() and SUBSTRING() Again, two functions named slightly differently do exactly the same thing: return a part (a substring) of a text value. Both SUBSTR() and SUBSTRING() allow you to define the string, the starting position of the substring, and its length. SELECT order_id, SUBSTR(order_id, 7, 8) AS order_date_substr, SUBSTRING(order_id, 7, 8) AS order_date_substring FROM `orders.orders`; When extracting a substring, the first argument in the function is the string itself. In this example, it is the column order_id. The starting position of the substring is the second argument. If it's a positive integer, it counts from the left; if it's negative, it counts from the right. In the above example, the substring starts from the seventh character of the string, counting from the left. Finally, the length of the substring is an optional argument. Here, the substring consists of eight characters. What do I get by doing this? Knowing the logic of the order IDs, I know that the order ID contains the date of the order. Yes, it consists of eight characters (year, month, date) and begins with the seventh character of the order ID. order_idorder_date_substrorder_date_substring &&&4592022070101&&&2022070120220701 &&&4592022070102&&&2022070120220701 &&&4922022070101&&&2022070120220701 &&&4922022071201ć&&2022071220220712 &&&4822022071501&&&2022071520220715 The result shows that the first order, for instance, was placed on 1 July 2022. REPLACE() As you may have guessed from its name, the REPLACE() function replaces one string of characters with another. For example, the client Jones & Son, Inc. was recently renamed to Jones & Jones & Son, Inc. Here's how I rename their occurrences in the table. SELECT id, customer, REPLACE(customer, 'Son', 'Jones & Son') AS customer_renamed FROM `orders.orders`; In the function, I first specify the column I want to change. Then I want to change "Son" to "Jones & Son". And there it is; the client name is changed in the table below. idcustomercustomer_renamed 1Jones & Son, Inc.Jones & Jones & Son, Inc. 2Jones & Son, Inc.Jones & Jones & Son, Inc. 3Stavros GmbHStavros GmbH 4Hendricks, Hendricks, HoekstraHendricks, Hendricks, Hoekstra 5Mckenna & PartnersMckenna & Partners TRIM() The TRIM() function cleans your data. How does it do that? By specifying a string and a Unicode character you want to get rid of. It removes all the leading and trailing occurrences of that character. Here's an example that cleans the order ID. SELECT id, order_id, TRIM(order_id, '&') AS order_id_trimmed FROM `orders.orders`; The arguments in the function are the column order_id and the character "&", the latter of which appears before and after the order ID for some reason. Here's how the cleaned data looks. idorder_idorder_id_trimmed 1&&&4592022070101&&&4592022070101 2&&&4592022070102&&&4592022070102 3&&&4922022070101&&&4922022070101 4&&&4922022071201ć&&4922022071201ć 5&&&4822022071501&&&4822022071501 REVERSE() This is another popular SQL text function that often comes in handy. It takes the string and reverses it so that the last character becomes the first, the second from the last becomes the second, and so on. For instance, this may be useful if the order IDs have been entered completely backward. The structure of the order IDs has been such that the first three digits are the customer ID, then the order date is the next eight digits, and the last two digits are the control number. But due to changes in the system, now the new order IDs have to be reversed. It is easy to do so with the REVERSE() function. SELECT id, order_id, REVERSE(order_id) AS order_id_reversed FROM `orders.orders`; The only thing I specify in the function is the column I want to reverse. The result is given below. idorder_idorder_id_reversed 1&&&4592022070101&&&&&&1010702202954&&& 2&&&4592022070102&&&&&&2010702202954&&& 3&&&4922022070101&&&&&&1010702202294&&& 4&&&4922022071201ć&&&&ć1021702202294&&& 5&&&4822022071501&&&&&&1051702202284&&& STRPOS() This is a function for when you want to find out the character position of a specific substring inside a given text. This is useful if, for example, you need to find the position of "@" in the email address of every customer. SELECT id, customer, email, STRPOS(email, '@') AS at_position FROM `orders.orders`; The first argument in the function takes, as usual, the column or the string value you want to search. Then you state the substring you want to search for. The function returns an integer value, i.e., the position of the substring from the left. idcustomeremailat_position 1Jones & Son, Inc.s.jones@jones.com8 2Jones & Son, Inc.s.jones@jones.com8 3Stavros GmbHjasmin.s@stavros.com9 4Hendricks, Hendricks, Hoekstracindy@hhh.com6 5Mckenna & Partnersseckenna.com7 Check the result, using the first email, "s.jones@jones.com", as an example. There are seven characters before the "@", so it is at the eighth position, which is precisely what the output shows. SPLIT() The SPLIT() function takes a string value and splits it into its components based on a delimiter. The function takes the string value as an argument. Also, the optional delimiter argument takes a character you want to use for splitting. In the data, the column shipping_address is perfect for showcasing how the function works. Let's say I want to split the address into the following components: street name, city, postal code, and country. For this, I have the following code. SELECT customer, shipping_address, SPLIT(shipping_address) AS address_split FROM orders.orders; The only argument in the function is the shipping address. I do not need to specify a delimiter because the comma is the default delimiter for string values, and that's exactly what I need here. Here's the output. customer shipping_address address_split Jones & Son, Inc. 10-2 Parkson St, Boston, MA 02136, USA 10-2 Parkson St Boston MA 02136 USA Jones & Son, Inc. 10-2 Parkson St, Boston, MA 02136, USA 10-2 Parkson St Boston MA 02136 USA Stavros GmbH Landsberger Alee 49, Berlin, 10249, Germany Landsberger Alee 49 Berlin 10249 Germany Hendricks, Hendricks, Hoekstra Anna Paulownastraat 23, Rotterdam, 3014 JA, Netherlands Anna Paulownastraat 23 Rotterdam 3014 JA Netherlands Mckenna & Partners 2-8 Ingram St, Glasgow, G1 1HA, UK 2-8 Ingram St Glasgow G1 1HA UK As you see, the shipping address is split into the desired components. However, I admit the output format is not too table-friendly. It would be ideal if there were a way to show each address component in a separate column. Fortunately, this is possible in BigQuery. It offers two functions: ORDINAL() and SAFE_ORDINAL(). They return the same result except when the index is out of range: ORDINAL() returns an error while SAFE_ORDINAL() returns NULL. Take a look at the code below. Don't worry; it's not difficult. SELECT customer, shipping_address, SPLIT(shipping_address, ', ') [SAFE_ORDINAL (1)] AS street, SPLIT(shipping_address, ', ') [SAFE_ORDINAL (2)] AS city, SPLIT(shipping_address, ', ') [SAFE_ORDINAL (3)] AS postal_code, SPLIT(shipping_address, ', ') [SAFE_ORDINAL (4)] AS country FROM orders.orders; As you see, I use the SPLIT() function four times, once for each address component. This is because I want each component in a dedicated column. To do this, I just use SAFE_ORDINAL() after each SPLIT(). The integer in the SAFE_ORDINAL() function specifies the index, that is, the position of the value. In this example, index 1 returns the first value in the string, which is the street address, index 2 is the city, and so on. This returns a much nicer output. It's perfect! customershipping_addressstreetcitypostal_codecountry Jones & Son, Inc.10-2 Parkson St, Boston, MA 02136, USA10-2 Parkson StBostonMA 02136USA Jones & Son, Inc.10-2 Parkson St, Boston, MA 02136, USA10-2 Parkson StBostonMA 02136USA Stavros GmbHLandsberger Alee 49, Berlin, 10249, GermanyLandsberger Alee 49Berlin10249Germany Hendricks, Hendricks, HoekstraAnna Paulownastraat 23, Rotterdam, 3014 JA, NetherlandsAnna Paulownastraat 23Rotterdam3014 JANetherlands Mckenna & Partners2-8 Ingram St, Glasgow, G1 1HA, UK2-8 Ingram StGlasgowG1 1HAUK If you're also an MS SQL Server user, take a look at the most popular text functions it offers. Aren't These SQL Text Functions Useful? There Are More! These examples show there are plenty of situations when text functions are helpful in your day-to-day data manipulation. I have covered only the most popular ones; there are more of them. If you're interested in what they are, consult the Google BigQuery documentation. It's a great source of reference. However, if you want these functions (or at least some of them) to become second nature for you, you need to use them in practice. Going through the numerous examples in the Standard SQL Functions course is a perfect way to do so! Tags: sql SQL text functions Google BigQuery