15th Aug 2024 10 minutes read How to Find SQL Databases for Practice Jakub Romanowski sql queries data analysis Table of Contents Our SQL Practice Databases SQL Databases for Practice: University Finding Students Who Started in 2020 SQL Databases for Practice: Northwind Store Finding Product Details for Each Order Finding Top Selling Products SQL Database for Practice: Blog Traffic Finding the Most Viewed Articles by Category Analyzing Article Performance Over Time SQL Databases for Practice: Athletic Finals Finding Athletes with the Most Wins Using CTE to Find the Fastest Race Times by Discipline SQL Database for Practice: Chinook Music Store Finding the Most Popular Music Genres SQL Databases for Practice: MoMA Analyzing the Most Prolific Artists Now Practice on These Real-World SQL Databases! Learning SQL is important for anyone working with data, but finding a SQL database for practice can be a challenge. In this guide, you'll find easy SQL queries (and some rather more complex) to help you get started using the databases from our new SQL Databases for Practice course. SQL can be fun and practical when you get to try out real queries on a real database. If you are a beginner this guide will help you get started with some simple SQL queries using the databases from our SQL Databases for Practice course. These can be run in our online console and you will see them in action! But first let's start from the basics: what is this course? Our SQL Practice Databases Our new course is designed to have you learn SQL by exploring real-world databases. Instead of guided exercises, you'll get to: Explore the databases at your own pace. Ask your own questions. Find your own answers using SQL. This hands-on approach is perfect for developing data analysis skills. This course is a hassle-free way to practice SQL. You won't have to worry about technical setup, so you can start writing and testing queries right away. The databases used in this course are fictional records from a university, an online store, a blog, and a music store; we also have actual records from the Museum of Modern Art and some sporting events. As you work with these databases, you will begin to feel comfortable using big datasets. You’ll also build your confidence in writing SQL queries from scratch. If you are a data analyst, a marketing professional, or someone who’s just curious about data, this course is your best self-paced option to apply and hone your SQL skills. Curious to learn more about these databases? Let’s do some sample exercises you can do with our SQL Practice Databases course! SQL Databases for Practice: University The university database stores information about courses, lecturers, students, semesters, course editions, and enrollments. Let’s start with the most basic action in SQL. To perform any analysis, we first need to retrieve the data from the database. We will use the SELECT statement to do this: SELECT title, lecture_hours, ects_points FROM course; This query lists all the courses available in the university database, along with their lecture hours and European Credit Transfer and Accumulation System (ECTS) points. It retrieves the course title, the number of hours dedicated to lectures, and the ECTS points for each course. This information helps students understand the time commitment and credit value of each course, making it easier to plan their studies. Easy, right? Finding Students Who Started in 2020 Here’s another basic query, this time using the BETWEEN clause: SELECT first_name, last_name, start_date FROM student WHERE start_date BETWEEN '2020-01-01' AND '2020-12-31'; SQL Databases for Practice: Northwind Store The Northwind Store database is a detailed dataset for an online store; it holds information on customers, products, orders, and sales. It allows for the analysis of customer behavior, product performance, and sales trends, providing valuable insights for improving business operations and marketing strategies. Finding Product Details for Each Order To retrieve detailed information about the products included in each order, we’ll combine data from the orders and order_items tables with a JOIN: SELECT o.order_id, o.order_date, p.product_name, oi.quantity, oi.unit_price FROM orders o JOIN order_items oi ON o.order_id = oi.order_id JOIN products p ON oi.product_id = p.product_id; This query lists the order ID, order date, product name, quantity ordered, and unit price for each product included in every order. This will give you comprehensive insights into the products sold and their corresponding order details. Here are some results: Finding Top Selling Products To identify top-selling products based on the quantity sold, we can use JOIN to combine data from the order_items and products tables. Then we can order the results by the total quantity sold: SELECT p.product_name, SUM(oi.quantity) AS total_quantity FROM order_items oi JOIN products p ON oi.product_id = p.product_id GROUP BY p.product_name ORDER BY total_quantity DESC; This query lists the product name and the total quantity sold for each product. Notice that we’ve aggregated the total quantities. The results are then ordered by the total quantity in descending order (i.e. 10–1), highlighting the top-selling products. "Camembert Pierrot" is the top seller with 22,549 units sold, followed by "Raclette Courdavault" with 21,147 units and "Gorgonzola Telino" with 18,654 units. This information helps us understand which products are the most popular and drive the most sales. SQL Database for Practice: Blog Traffic The blog traffic database provides detailed information about the performance of a pet store's blog. It includes data on article views, author contributions, and the impact of promotional campaigns. This dataset is ideal for practicing blog traffic trend analysis, which helps us create better reader engagement and evaluate the effectiveness of marketing strategies. Finding the Most Viewed Articles by Category To analyze the performance of articles by category, we can use JOIN to combine data from the articles, article_traffic, and article_category tables. This query will provide the total views for each article within each category: SELECT ac.name AS category_name, a.title AS article_title, SUM(at.views) AS total_views FROM article a JOIN article_traffic at ON a.url = at.url JOIN article_category ac ON a.article_category_id = ac.id GROUP BY ac.name, a.title ORDER BY ac.name, total_views DESC; This query lists the category name, article title, and total views for each article. The results are grouped by category and ordered descendingly by total views within each category. Analyzing Article Performance Over Time Our next query is longer and a little more advanced. To gain insights into how the performance of articles changes over time, we’ll analyze the monthly views for each article. The following query combines data from the article, article_traffic, and article_category tables and groups the results by month and article: SELECT a.title AS article_title, ac.name AS category_name, DATE_TRUNC('month', at.day) AS month, SUM(at.views) AS total_views FROM article a JOIN article_traffic at ON a.url = at.url JOIN article_category ac ON a.article_category_id = ac.id GROUP BY a.title, ac.name, DATE_TRUNC('month', at.day) ORDER BY a.title, month; This query selects the article title and category name for clear identification. It uses the DATE_TRUNC('month', at.day) function to group views by month and the SUM(at.views) function to calculate total monthly views. SQL Databases for Practice: Athletic Finals This database contains data from the finals of different track running disciplines across three athletics championships: the Rio de Janeiro Olympic Games in 2016, the London IAAF World Championships in Athletics in 2017, and the Doha IAAF World Championships in Athletics in 2019. Finding Athletes with the Most Wins To identify athletes who have won the most races, we’ll JOIN the athlete, result, race, and round tables. This query will count the number of first-place finishes for each athlete: SELECT a.first_name, a.last_name, COUNT(r.place) AS wins FROM athlete a JOIN result r ON a.id = r.athlete_id JOIN race ra ON r.race_id = ra.id JOIN round ro ON ra.round_id = ro.id WHERE r.place = 1 GROUP BY a.first_name, a.last_name ORDER BY wins DESC; This query lists athletes' first and last names along with the count of their first-place finishes; the results are ordered descendingly by the number of wins. The top two athletes are Marie-Josée Ta Lou and Shaunae Miller-Uibo, both with 8 wins. Using CTE to Find the Fastest Race Times by Discipline Here’s another longer and more advanced query. To find the fastest race times by discipline, we’ll use a Common Table Expression (CTE) to first select the best times for each discipline. We’ll then use another query to join the CTE to two tables to retrieve more details. WITH FastestTimes AS ( SELECT e.discipline_id, r.result, r.athlete_id, RANK() OVER (PARTITION BY e.discipline_id ORDER BY r.result) AS rank FROM result r JOIN race ra ON r.race_id = ra.id JOIN round ro ON ra.round_id = ro.id JOIN event e ON ro.event_id = e.id WHERE r.result IS NOT NULL AND ro.is_final = TRUE ) SELECT d.name AS discipline_name, ft.result AS fastest_time, a.first_name, a.last_name FROM FastestTimes ft JOIN discipline d ON ft.discipline_id = d.id JOIN athlete a ON ft.athlete_id = a.id WHERE ft.rank = 1 ORDER BY d.name; This query uses a CTE named FastestTimes to rank race results for each discipline. It then selects the fastest time (i.e. where rank = 1) along with the athlete's details. The final results are: SQL Database for Practice: Chinook Music Store The Chinook database is a sample database for a digital music store created by Luis Rocha. You can find the original database here. Finding the Most Popular Music Genres To understand which music genres are the most popular in the Chinook Music Store, we’ll analyze the sales data by genre. This query will help us identify the top-selling genres by counting the number of tracks sold for each genre: SELECT g.name AS genre_name, COUNT(il.track_id) AS tracks_sold FROM genre g JOIN track t ON g.genre_id = t.genre_id JOIN invoice_line il ON t.track_id = il.track_id GROUP BY g.name ORDER BY tracks_sold DESC; The query joins the genre, track, and invoice_line tables to link genres with the tracks sold. By connecting these tables, we can accurately count the number of tracks sold for each genre. The results are grouped by genre name to aggregate the number of tracks sold within each genre. This grouping allows us to see the total sales for each genre. Finally, the results are ordered by the number of tracks sold in descending order. Cool, right? SQL Databases for Practice: MoMA The Museum of Modern Art (MoMA) in New York City is one of the most influential modern art museums in the world. It has made its comprehensive collection data publicly available. The MoMA dataset is frequently updated. In our database, we've compiled information about the museum's esteemed Paintings & Sculptures department, which features MoMA’s most celebrated works. Analyzing the Most Prolific Artists To gain insights into the most prolific artists in the Museum of Modern Art database, we’ll use SQL window functions. This query will count the number of artworks each artist has in the collection and rank the artists based on their contributions. This approach provides a comprehensive overview of the artist's presence in the museum. Here’s the query: WITH ArtworkCounts AS (SELECT ar.name AS artist_name, ar.id AS artist_id, COUNT(aa.artwork_id) OVER (PARTITION BY ar.id) AS artwork_count FROM artist ar JOIN artwork_author aa ON ar.id = aa.artist_id) SELECT artist_name, artwork_count, RANK() OVER (ORDER BY artwork_count DESC) AS artist_rank FROM ArtworkCounts GROUP BY artist_name, artist_id, artwork_count ORDER BY artist_rank; The query selects the artist's name and uses the COUNT() function with a window function (OVER (PARTITION BY ar.id)) to count the number of artworks for each artist. This provides the total count of artworks per artist, giving a clear view of each artist's contribution to the collection. To rank the artists based on the number of artworks, the query uses the RANK() function with a window function (OVER (ORDER BY artwork_count DESC)). This ranking helps identify the most prolific artists by their contributions to the collection, making it easy to see which artists have the largest presence in the museum. The ArtworkCounts CTE is used to simplify the main query. It calculates the artwork count for each artist and then the main query performs the ranking. This approach makes the query easier to read and maintain. The results are grouped by the artist's name, ID, and artwork count to aggregate the artwork. The top ten from the final results are: Using window functions in this query gives us a powerful way to analyze and rank artists based on their contributions to the MoMA collection. They make it easier to understand and appreciate the scope of each artist's work in the museum. Now Practice on These Real-World SQL Databases! You have seen how to run queries on SQL practice databases. From university records and online store sales to blog traffic, music sales, and art collections, these examples show how versatile and powerful SQL is when used with different data sets. Try these databases out to improve your SQL skills and feel more comfortable with large datasets. These SQL databases for practice are perfect for data analysts, marketing analysts, marketers, or anyone who’s just curious about data. Try out these databases and many more in our SQL Databases for Practice course and learn to ask your own questions and get answers with SQL. No technical setup is required – just create a free account with LearnSQL.com and get started. Happy learning! Tags: sql queries data analysis