18th Jul 2024 20 minutes read SQL Project for Portfolio: Northwind Store Tihomir Babic learn sql data analysis Table of Contents The Northwind Store Dataset 1. Database Inspection Inspecting Table Names Inspecting Column Information Finding Primary and Foreign Keys Understanding the Relationships Between the Tables 2. Data Exploration Exploring the orders Table Count of Rows Minimum and Maximum Values Listing Categories Counting Distinct Values in Each Category Counting Rows per Category Value Sum and Average Values Bonus Data Exploration Number of Products Revenue by Year Customer Segmentation by Country 3. Advanced Data Analysis and Actionable Insights Sales by Channel Order Value Distribution High- and Low-Value Customers by Channel Ready for Your Own SQL Portfolio Project? SQL portfolio projects are an important part of a data analyst’s education. How do you start a project, and where do you go with the insights you discover? We’ll use the Northwind store database to answer this. Doing a solid SQL project is an essential element of any data analyst’s learning process. An SQL project is a great tool for learning, as it makes you use SQL on a real-world dataset. This is especially important if you’re a beginner lacking the opportunities to work with real-world datasets. By working with multiple real-world datasets, you learn what issues you may encounter in the real world. As a bonus, doing several SQL projects for your portfolio is always a nice thing to see in a resume. The question is, how do you prepare an SQL project portfolio? Once you find a free online dataset you want to analyze, what do you do next? We’ll answer these questions using the Northwind store database. You can find this dataset in the SQL Databases for Practice course. It is one of six datasets that include data from a university, blog traffic, athletic scores, a music store, and the Museum of Modern Art (MoMA). This course is a part of the SQL Practice track, where you can practice aggregation, JOINs, subqueries, CTEs, CASE WHEN, and other important SQL topics. If you need to refresh your knowledge on some of those areas important for reporting, try our Creating Basic SQL Reports course. Let’s now take the Northwind store dataset and use it for an SQL project for a data analyst portfolio. I’ll do this project in PostgreSQL, but everything I do is transferable to other databases with minor syntax tweaks. As we go, feel free to consult our free SQL for Data Analysis Cheat Sheet. The Northwind Store Dataset The only info I’ll give you about this database is that it consists of six tables: categories – A list of product categories. channels – A list of sources through which the store acquires customers. customers – A list of the store’s customers. order_items – A list of the products included in each order. orders – A list of orders placed by customers. products – A list of the products the store offers. We’ll gather the rest of the info by doing our project. As a matter of fact, this database inspection should be every project’s initial stage, the one before you start doing analysis. 1. Database Inspection This stage of preparing an SQL project involves getting to know your data, such as table and column names, primary and foreign keys, the relationships between the tables, and the data types in each table. Inspecting Table Names A primitive way of learning about the tables in the dataset is to find them in the browser of the RDBMS where you imported the dataset, e.g., PostgreSQL, SQL Server, or MySQL. In PostgreSQL, you can write this query to get a list of all tables in a database: SELECT table_name FROM information_schema.tables WHERE table_schema NOT IN ('information_schema', 'pg_catalog') AND table_type = 'BASE TABLE' ORDER BY table_schema, table_name; It looks for data in the information_schema, where you can find database metadata. Since I want the name of every table, I use the view tables after referencing the schema and put table_name in the SELECT. The first condition in WHERE filters out system schemas and leaves only user-defined tables. The second condition ensures that only base tables are listed, without views and other tables. Here’s the list of tables in the Northwind database: table_name categories channels customers order_items orders products Inspecting Column Information Now we want to understand the details about each table better. Knowing their columns is a good start. We can again query information_schema to get important info about the columns: SELECT table_name, column_name, data_type, is_nullable, column_default FROM information_schema.columns WHERE table_schema NOT IN ('information_schema', 'pg_catalog') ORDER BY table_name, ordinal_position; We can find table names, column names, and the data type of each column by listing table_name, column_name, and data_type in the SELECT. The is_nullable column will get us info about whether the column accepts NULL values. table_namecolumn_namedata_typeis_nullable categoriescategory_idintegerNO categoriescategory_namecharacter varyingNO categoriesdescriptiontextYES channelsidintegerNO channelschannel_namecharacter varyingNO customerscustomer_idintegerNO customersemailcharacter varyingNO customersfull_namecharacter varyingNO customersaddresscharacter varyingYES customerscitycharacter varyingYES customersregioncharacter varyingYES customerspostal_codecharacter varyingYES customerscountrycharacter varyingYES customersphonecharacter varyingYES customersregistration_datetimestamp without time zoneNO customerschannel_idintegerNO customersfirst_order_idintegerYES customersfirst_order_datetimestamp without time zoneYES customerslast_order_idintegerYES customerslast_order_datetimestamp without time zoneYES order_itemsorder_idintegerNO order_itemsproduct_idintegerNO order_itemsunit_pricenumericNO order_itemsquantitysmallintNO order_itemsdiscountnumericNO ordersorder_idintegerNO orderscustomer_idintegerNO ordersorder_datetimestamp without time zoneYES orderstotal_amountnumericNO ordersship_namecharacter varyingYES ordersship_addresscharacter varyingYES ordersship_citycharacter varyingYES ordersship_regioncharacter varyingYES ordersship_postalcodecharacter varyingYES ordersship_countrycharacter varyingYES ordersshipped_datetimestamp without time zoneYES productsproduct_idintegerNO productsproduct_namecharacter varyingNO productscategory_idintegerNO productsunit_pricenumericYES productsdiscontinuedbooleanNO This way, we have all the info in one place, which makes it easier to investigate. First, we can understand what data each table shows by seeing the column names. The data types all seem logical. For example, there are no IDs defined as character varying. So, there will be no need to convert data into a suitable format at this stage. Regarding NULLs, we get some valuable information about the tables: The table categories allows for the category description to be NULL. The table channels allows no NULLs at all. The table customers allows NULLs in many columns, including country. If we want to create a geographical segmentation of customers, these NULLs could cause problems for us. The above output shows that no columns in the table order_items are nullable. For the products table, it’s interesting to note that the unit price can be NULL. It seems a bit strange that the table orders has so many nullable columns. It basically makes it possible to have an order ID and no other information about the order. We have to keep that in mind when analyzing this table. Finding Primary and Foreign Keys The next step of data inspection is understanding how the tables work together. To do that, we first need to find their primary keys (PKs) and foreign keys (FKs). PKs will show us which column(s) a table uses to uniquely identify data. By doing this, you’ll understand the structure of the table better. FKs will show us which column is linked to a primary key from another table. This is the basis for understanding the relationships between the tables. You can list both PKs and FKs using the query below: SELECT kcu.table_name, kcu.column_name, tc.constraint_type FROM information_schema.table_constraints tc JOIN information_schema.key_column_usage kcu ON kcu.constraint_name = tc.constraint_name AND kcu.constraint_schema = tc.constraint_schema WHERE tc.constraint_type IN ('PRIMARY KEY', 'FOREIGN KEY') ORDER BY kcu.table_name ASC, tc.constraint_type DESC; This time, we need to use two views from the information_schema, namely table_constraints and key_column_usage. The table_constraints view contains info about the constraints for each table. The key_column_usage view is for getting the info about columns with the key constraints. These two views are joined by two common columns: constraint_name (the name of the constraint) and constraint_schema (the name of the schema containing the constraint). I use the WHERE clause to output only the primary key and foreign key constraints. As a final step, I order the output alphabetically by the table name and then reverse alphabetically by constraint type. The code outputs this table: table_namecolumn_nameconstraint_type categoriescategory_idPRIMARY KEY channelsidPRIMARY KEY customerscustomer_idPRIMARY KEY customerslast_order_idFOREIGN KEY customersfirst_order_idFOREIGN KEY customerschannel_idFOREIGN KEY order_itemsorder_idPRIMARY KEY order_itemsproduct_idPRIMARY KEY order_itemsorder_idFOREIGN KEY order_itemsproduct_idFOREIGN KEY ordersorder_idPRIMARY KEY orderscustomer_idFOREIGN KEY productsproduct_idPRIMARY KEY productscategory_idFOREIGN KEY The tables categories and channels are the only tables that have PK but no FK. All other tables have one PK and at least one FK. The only exception is the table order_items, which has the PK constraint on two columns: order_id and product_id. In other words, the table has a composite primary key, which means that the unique identifier of the data is the unique combination of two columns. In this case, it’s to show only one product ID by an individual order. A closer look reveals that the same columns are also a foreign key for the same table. This means that the primary keys are also foreign, as they come from other tables. Understanding the Relationships Between the Tables Listing PKs and FKs already gives us some idea about the relationships between the tables. However, we can better understand them by showing the foreign keys' parent and child tables and columns. Why would we do that? A foreign key is always another table’s primary key. Listing all the parent and child tables and columns is an upgrade on the previous query because we’ll easily see the shared columns between tables. This information is useful for understanding the relationships between the tables, knowing which columns you can use to join tables, and whether they can be joined directly or through another table. Here’s the code: SELECT ccu.table_name AS parent_table, ccu.column_name AS parent_column, kcu.table_name AS child_table, kcu.column_name AS child_column FROM information_schema.table_constraints AS tc JOIN information_schema.key_column_usage AS kcu ON tc.constraint_name = kcu.constraint_name JOIN information_schema.constraint_column_usage AS ccu ON ccu.constraint_name = tc.constraint_name WHERE tc.constraint_type = 'FOREIGN KEY' ORDER BY kcu.table_name; In the query, we use info from the table_constraints, key_column_usage, and constraint_column_usage views. The constraint_column_usage view shows the columns used by a constraint. We’ll use it to show the FK parent table and column, i.e., the table and column where this FK is actually a PK. The data from key_column_usage will show the table and the column of an FK. parent_tableparent_columnchild_tablechild_column ordersorder_idcustomerslast_order_id ordersorder_idcustomersfirst_order_id channelsidcustomerschannel_id ordersorder_idorder_itemsorder_id productsproduct_idorder_itemsproduct_id customerscustomer_idorderscustomer_id categoriescategory_idproductscategory_id From the output above, we see that the table orders is directly connected to the table customers via the columns order_id (PK) and customer_id (FK). It’s connected to the table order_items via order_id (PK). The table channels is directly connected only with the table customers. We’ve already established that the primary keys of the table order_items are also foreign keys. From the above table, we can see they come from the tables orders and products. The table categories is connected only with the table products. 2. Data Exploration When you start working with a new database, the first step should be exploring your data to gain some basic info about each table in a database. It’s a good practice to look at the following: Counting rows. Finding minimum and maximum values (especially for dates). Listing distinct categories. Counting distinct values in each category. Counting rows per category value. Sums and averages for key numerical values. This is not an exhaustive list, and it can change with the table you’re exploring. However, it’s a minimum and should work on most tables. In general, this step boils down to some fundamental data aggregation and grouping. I’ll show you how this works on one table from our database, and then you can explore other tables the same way. Exploring the orders Table Count of Rows As expected, we use the COUNT() aggregate function here: SELECT COUNT(*) AS number_of_rows FROM orders; I use the COUNT() function with an asterisk; this will count all the rows, including NULLs. number_of_rows 11,618 The table orders has 11,618 rows. In other words, there are 11,618 orders. Minimum and Maximum Values Let’s see which columns are suitable for use with the MIN() and MAX() functions. Perfect – we can find the oldest and the most recent dates when an order was placed and shipped: SELECT MIN(order_date) AS oldest_order_date, MAX(order_date) AS latest_order_date, MIN(shipped_date) AS oldest_shipped_date, MAX(shipped_date) AS latest_shipped_date FROM orders; From the output, we can see the first order was on 2017-01-01 and the last was on 2024-06-24. As for the shipping dates, the first is on 2017-01-01 and the most recent is on 2024-06-23. oldest_order_datelatest_order_dateoldest_shipped_datelatest_shipped_date 2017-01-01 0:00:002024-06-24 0:00:002017-01-06 0:00:002024-06-23 0:00:00 Listing Categories We don’t need any aggregate functions to do this: SELECT DISTINCT ship_country FROM orders ORDER BY ship_country; In the query, we use the DISTINCT clause to show each country only once. Here’s a list of the countries in this table: ship_country Argentina Austria Belgium Brazil Canada Denmark Finland France Germany Ireland Italy Mexico Norway Poland Portugal Spain Sweden Switzerland UK USA Venezuela It will also be useful to see a list of cities: SELECT DISTINCT ship_city FROM orders ORDER BY ship_city; As there are many cities, here’s a partial list: ship_city Aachen Abilene Achille Adelphi Adrian Akron Albany Alberton Albuquerque … Yucca Valley Counting Distinct Values in Each Category Let’s now see the number of countries to which Northwind has delivered shipments: SELECT COUNT(DISTINCT ship_country) AS number_of_countries FROM orders; We again use COUNT(), but this time we add DISTINCT and reference the column we want to count. The output shows there are 21 different countries where the company delivers: number_of_countries 21 We could do the same for cities: SELECT COUNT(DISTINCT ship_city) AS number_of_cities FROM orders; There are 494 unique cities: number_of_cities 494 Even though we didn’t list them as a separate category, knowing how many customers have placed orders will be useful: SELECT COUNT(DISTINCT customer_id) AS number_of_customers_with_orders FROM orders; We use COUNT() to count customer IDs. Of course, we need DISTINCT, as it could be expected that some customers will appear many times, i.e., every time they place an order. There are 761 unique customers who ordered at least once from the company. number_of_customers_with_orders 761 Counting Rows per Category Value For this, we want to list each category value (we did that earlier) and count the number of rows for each value. For example, this will return the number of orders per country: SELECT ship_country, COUNT(*) AS number_of_orders FROM orders GROUP BY ship_country ORDER BY number_of_orders DESC; I select the ship_country column and then use COUNT(*) to count all rows. For counting to be by each country, I need to introduce the GROUP BY ship_country clause. What we get by this is the number of orders by country. The output shows the USA is the biggest market, with 9,789 orders: ship_countrynumber_of_orders USA9,789 France236 Brazil233 Germany230 UK144 Spain123 Mexico118 Venezuela103 Argentina87 Canada82 Italy64 Austria58 Portugal58 Belgium56 Denmark56 Finland47 Norway30 Sweden28 Poland27 Ireland26 Switzerland23 Sum and Average Values If your data is suitable, you should find averages and totals of numerical values. We can do that for the column total_amount, which is the value of an order: SELECT SUM(total_amount) AS orders_total_value, AVG(total_amount) AS average_order_value FROM orders; To show these two calculations, use the SUM() and AVG() aggregate functions. We can see the total store revenue is a little over 19 million. The average order value is 1,636.15: orders_total_valueaverage_order_value 19,008,819.691,636.15 Bonus Data Exploration As I mentioned, the previous aggregations should be the minimum you do for each table in the database. This data exploration goes a bit further (but not too far) from simple aggregations and GROUP BY. While this is still the basis, you can also employ other concepts like filtering data (using WHERE and/or HAVING), extracting date parts or the time, using CASE WHEN to label data, and so on. Let’s examine several examples. Number of Products We can employ the COUNT() function to find the total number of products the store sells. SELECT COUNT(*) AS number_of_active_products FROM products WHERE discontinued IS FALSE; There’s a condition in the WHERE clause to show only products that are not discontinued, i.e., these are the products the store is currently selling. From the earlier section, we know that the column discontinued is a Boolean type. We, therefore, have to use the IS FALSE operator to include only non-discontinued products in the count. The count is 69: number_of_active_products 69 Revenue by Year A simple report showing revenue by year can be created using SUM(): SELECT EXTRACT(YEAR FROM order_date) AS revenue_year, SUM(total_amount) AS revenue FROM orders GROUP BY revenue_year; We use the EXTRACT() function to get only years from the order dates. Then, we sum the total amounts of all orders and group by year to show the values for each year separately. We can see from the results that the company's peak year was 2018. The year with the lowest revenue is 2024, but this might be because the year is not over (at the time of doing the analysis). Another interesting thing is that there’s no revenue data for the years 2019-2022. This should be checked to see if the data is missing for a reason or if that’s a mistake. revenue_yearrevenue 20173,088,759.84 20189,368,330.91 20234,646,048.11 20241,905,680.83 Customer Segmentation by Country We have data about customers’ countries, so an overview of the number of customers in each country would be informative. Also, we are not interested in customers who haven’t placed an order, as this can artificially inflate the number of customers. No, we want only customers who order from us. Here’s the query: SELECT country, COUNT(*) AS number_of_customers FROM customers WHERE first_order_id IS NOT NULL GROUP BY country ORDER BY number_of_customers DESC; We select and group by country and use the COUNT() aggregate function to find the number of customers. If the data in the column first_order is not null, then this customer placed at least one order; this is the condition we have to use in WHERE. The data is ordered from the highest to the lowest number of customers. The output shows that Northwind’s largest market in terms of customers is the USA. It’s also safe to conclude that it’s the biggest market by revenue. countrynumber_of_customers USA697 Germany8 France8 Brazil8 UK5 Venezuela4 Spain4 Mexico4 Argentina3 Canada3 Belgium2 Denmark2 Portugal2 Finland2 Italy2 Austria2 Sweden1 Poland1 Ireland1 Switzerland1 Norway1 3. Advanced Data Analysis and Actionable Insights What we did so far is a good start. However, data analysis in an SQL project for a portfolio shouldn’t stop at simple aggregation and exploration for each table. We’ll now go beyond that to write more complex queries that will allow us to take action and improve Northwind’s business. For example, we may want to see how the sales change depending on the channel. Let’s see what we get and then decide what our next steps will be. Sales by Channel For each channel, we want to show the total sales, average order value, total number of orders, and number of unique customers. We also want to rank channels by total sales. Here’s the query: SELECT ch.channel_name, SUM(o.total_amount) AS total_sales, AVG(o.total_amount) AS average_order_value, COUNT(o.order_id) AS total_orders, COUNT(DISTINCT o.customer_id) AS unique_customers, RANK() OVER (ORDER BY SUM(o.total_amount) DESC) AS sales_rank FROM channels ch LEFT JOIN customers c ON ch.id = c.channel_id LEFT JOIN orders o ON c.customer_id = o.customer_id GROUP BY ch.channel_name ORDER BY sales_rank; This query uses several aggregations: SUM() to calculate total sales, AVG() for the average order value, and COUNT() for the total number of orders and (with DISTINCT) for unique customers. Then, we use the RANK() window function to rank channels by total sales. Since window functions are executed before aggregation, we can’t simply use the total_sales column in the window function. Instead, I need to replicate the entire calculation – a simple task completed by copying and pasting. We use data from the tables channels and orders. However, I can’t directly join these two tables as they don’t have a shared column. Instead, we have to join them via the table customers. Here’s the output: channel_nametotal_salesaverage_order_valuetotal_ordersunique_customerssales_rank Organic Search14,003,046.951,603.108,7355651 Direct2,436,649.061,799.591,354912 Referral1,034,734.451,645.05629463 Social837,378.771,824.35459324 Paid Search483,824.241,645.66294195 Email213,186.221,450.2514786 We see that most of Northwind’s sales come through organic search. There are no contradictions in the data – the rank by total sales also reflects the number of orders and unique customers. It’s interesting to note that the Organic Search channel doesn’t have the highest average order value. This calls for some improvement! Let’s envisage a revenue increase strategy that focuses on increasing the average order value of all channels rather than on new acquisitions. We probably won’t be able to take a uniform approach based only on the marketing channel. Within each segment, there might be customers with polar opposite spending habits. This, we can assume, especially applies to Organic Search, which is such a huge channel. So, we need to learn more about the order value distribution for each channel. Order Value Distribution Let’s calculate the average order value, median, upper quartile, lower quartile, and standard deviation for each channel: SELECT ch.channel_name, AVG(total_amount) AS average_order_value, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY total_amount) AS median_order_value, PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY total_amount) AS upper_quartile_order_value, PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY total_amount) AS lower_quartile_order_value, STDDEV(total_amount) AS order_value_stddev FROM channels ch LEFT JOIN customers c ON ch.id = c.channel_id LEFT JOIN orders o ON c.customer_id = o.customer_id GROUP BY ch.channel_name; The average order value is calculated by AVG(). The median (or 50th percentile), upper quartile, and lower quartile are calculated using the PERCENTILE_CONT() ordered-set aggregate function, with the percentile specified in parentheses. For the percentile calculation, data has to be sorted in ascending order; we do that using the WITHIN GROUP clause. After the percentile calculation, STDDEV() is used to calculate the standard deviation. Here’s the output: channel_nameaverage_order_valuemedian_order_valueupper_quartile_order_valuelower_quartile_order_valueorder_value_stddev Direct1,799.591,005.902,166.80517.402,323.63 Email1,450.25960.001,970.41492.501,655.97 Organic Search1,603.101,007.702,018.20480.001,916.39 Paid Search1,645.661,083.002,104.40486.551,813.22 Referral1,645.051,064.002,034.50482.901,969.01 Social1,824.351,122.802,123.20559.902,319.10 Let’s analyze the metrics for the first channel. The Direct channel’s standard deviation is 2,323.63, which is higher than the mean or average value. This indicates high variability, i.e., there are probably outliers or a wide range of order values. Lower and upper quartiles show that 50% of orders fall between 517.40 and 2,166.80. However, a high standard deviation means many orders are outside of this range. The median is significantly lower than the mean, which indicates that the distribution is right-skewed, i.e., a low number of high-order values that increase the mean. We can analyze each channel the same way. This gives us another idea. To create customized strategies for revenue increase, we can further segment the customers by the upper and lower quartile. We’ll consider all the customers above the upper quartile as high spenders; low spenders will be the customers below the lower quartile. That way, we can adapt our strategy to each channel and each group’s spending habits. High- and Low-Value Customers by Channel I will do this calculation for only one marketing channel. You can do it for all the rest, as the code will stay the same; you only change the channel in WHERE. The customers in the upper quartile are high-value customers, and we want them on a separate list: WITH customer_order_totals AS ( SELECT c.customer_id, c.full_name, SUM(o.total_amount) AS total_order_value, ch.channel_name FROM customers c JOIN orders o ON c.customer_id = o.customer_id JOIN channels ch ON c.channel_id = ch.id WHERE ch.channel_name = 'Direct' GROUP BY c.customer_id, c.full_name, ch.channel_name ), upper_quartile_value AS ( SELECT PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY total_order_value) AS upper_quartile FROM customer_order_totals ) SELECT cot.customer_id, cot.full_name, cot.total_order_value, cot.channel_name FROM customer_order_totals cot JOIN upper_quartile_value uqv ON cot.total_order_value >= uqv.upper_quartile ORDER BY cot.total_order_value DESC; The first CTE calculates each customer's total order value by using SUM() and joining the tables customers and orders. In WHERE, we filter out all the channels except Direct. The second CTE calculates the upper quartile in a familiar manner. The third SELECT then CROSS JOINs the two CTEs to find all the customers whose order value is above the upper quartile. Here are the first ten rows of the output: customer_idfull_nametotal_order_valuechannel_name 134Barry Michael79,371.50Direct 152Carolann Williams64,365.21Direct 7Frédérique Citeaux61,865.74Direct 17Sven Ottlieb57,251.14Direct 64Sergio Gutiérrez55,140.75Direct 490Alice Blevins54,736.24Direct 8Martín Sommer54,499.55Direct 303Gregory Mack52,554.20Direct 316Jeff Heard51,976.31Direct 129Stephan Bufford50,868.70Direct In total, there are 23 customers in the output. We can use this information to target these high-value customers. For example, we can create loyalty programs where these customers can get special discounts, personalized service, redeemable points for purchases, VIP membership programs, and so on. Similarly, we can list customers whose orders are below the lower quartile: WITH customer_order_totals AS ( SELECT c.customer_id, c.full_name, SUM(o.total_amount) AS total_order_value, ch.channel_name FROM customers c JOIN orders o ON c.customer_id = o.customer_id JOIN channels ch ON c.channel_id = ch.id WHERE ch.channel_name = 'Direct' GROUP BY c.customer_id, c.full_name, ch.channel_name ), lower_quartile_value AS ( SELECT PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY total_order_value) AS lower_quartile FROM customer_order_totals ) SELECT cot.customer_id, cot.full_name, cot.total_order_value, cot.channel_name FROM customer_order_totals cot, lower_quartile_value lqv WHERE cot.total_order_value <= lqv.lower_quartile ORDER BY cot.total_order_value ASC; This query is almost the same as the previous one except this time we calculate the lower quartile. Also, we search for customers with orders below that. Once again, the query returns 191 customers. Here are the first ten rows: customer_idfull_nametotal_order_valuechannel_name 939Shannon Aguilar98.40Direct 997Barbra Armstrong251.50Direct 687Andrew Scott452.90Direct 787Dennis Myer912.00Direct 917Les Allen991.40Direct 921Shelby Turgeon1,162.25Direct 560Nancy Wiggins1,425.80Direct 678Tracey Thomas2,555.20Direct 756Dora Rowlands2,713.50Direct 715George Scott2,906.50Direct These customers need to be approached differently, as they need to be motivated to spend more. Strategies to increase revenue from them may include personalized product bundles, free shipping above a certain amount, increasing discounts for higher order thresholds, or offering a free gift above a certain order amount. Now, do the same analysis for all other channels and think about what tactics you could use to increase the order values from that channel. Ready for Your Own SQL Portfolio Project? In this article, we showed how to prepare an SQL portfolio project using the Northwind store database. Of course, if you feel creative enough, you can also create your own dataset. Completing an SQL project for your portfolio is an important part of preparing for the hiring process. Portfolios are a great showcase of your practical data analysis skills in SQL. The above project is just an example. There are many more analysis ideas you can try out in our SQL Databases for Practice course, which we highly recommend! Tags: learn sql data analysis