Introduction
SQL is Fun!
Sidebar: Why the Taylor Swift picture? I'm a HUGE fan of Taylor Swift! (I've been to 40+ of her concerts...no kidding!)
She has a lyric "Hey kids, Spelling is fun!"; just changed it to "SQL is Fun!" ... why not have some fun learning SQL!
SQL is like magic, and knowing how to write SQL is like a superpower. I get to look at millions of rows of data across hundreds of database tables and uncover interesting insights. Here are some examples of how I've used SQL in the past:
- Unified customer profiles
- The holy grail of customer data is merging data between multiple systems to create a unified customer profile. You've got customers on your website, placing orders, calling into customer service, interacting with your mobile app, etc.
- Business benefit: Having one record for each customer regardless of their history with your company; able to measure lifetime value, orders, subscriptions, easily.
- The holy grail of customer data is merging data between multiple systems to create a unified customer profile. You've got customers on your website, placing orders, calling into customer service, interacting with your mobile app, etc.
- Insights
- At Hulu, we used millions of hours of viewing habits and discovered users who started watching television shows instead of movies had a higher propensity to make it past their "free trial" and subscribe to the service.
- Business benefit: Increasing customer activation and engagement with personalized digital messaging.
- At Hulu, we used millions of hours of viewing habits and discovered users who started watching television shows instead of movies had a higher propensity to make it past their "free trial" and subscribe to the service.
SQL is one of the most popular programming languages and is one of the top skills companies are looking for. People who can connect the dots in their data to highlight insights to drive their business are in huge demand.
Companies and teams are drowning in data -- data collected on e-commerce sites, social media platforms, mobile apps, and analytics platforms. Anytime you purchase from Amazon, view a video on TikTok, listen to a song on Spotify, all of this is raw data used to record events taken by users.
All of this data needs to be organized and made sense of...and that's where SQL comes in.
What is SQL?
SQL stands for "Structured Query Language" and is pronounced, "Sequel". It was developed in the early 1970s to manage and retrieve data from IBM databases.
Data is stored in a relational database management system (RDBMS). Think of this like a hyped up version of an Excel or Google Sheets workbook. Each workbook can contain multiple sheets, and within each sheet, you have rows and columns of data. In a database, data is stored in tables. Each table also has columns and rows of data.
Below is an example of 10 customer records stored in a spreadsheet and stored in a database table.
Using the data above, if you wanted to find Amazon customers who have a @gmail.com email addresses, you'd write the SQL statement below.
SELECT *
FROM Amazon_Customers
WHERE EMAIL LIKE '%@gmail.com'
So, why use Databases and SQL versus Excel or Google Sheets?
Excel and Google Sheets are limited by the number of rows and columns it can handle. Both applications slow to a crawl when you're copying and pasting 100,000 to 250,000 rows of data. (Yes, I manage data in the hundreds of thousands and millions). It simply can't handle large amounts of data.
Databases are built to manage and store large amounts of data; I'm talking about millions and billions of rows of records. SQL is the programming language used to retrieve and analyze this data.
Databases also have the advantage of maintaining data consistency and integrity. Sure, multiple people can collaborate on a Google sheet, but anyone can update or delete data; databases have features to prevent accidental changes to maintain data governance.
Another key advantage of databases are the relationships between tables. Database systems can have hundreds of tables with data that connects to one another, creating complex data relationships. Imagine trying to connect customers to orders and products to subscription services in a spreadsheet...no thank you.
Database Tables
Below is a diagram of database tables that I've made up and will use for my tutorials. It's what I imagine a database looks like made up of Amazon customers and orders. The lines between each table shows the relationship between customers, products and their orders.
You'll notice that there's a PK and FK on the left column of each table. PK means Primary Key and FK means Foreign Key. These are data modeling concepts we'll learn about in the future. Primary and Foreign keys are ways to maintain uniqueness in data and create relationships between tables.
- AMAZON_CUSTOMERS
- table with Customer information (Names, billing, demographic info)
- AMAZON_PRODUCTS
- table with Product catalog information (ASIN, product name, description, price)
- AMAZON_CATEGORIES
- table with the list of categories products would fall into, such as Kitchen & Bath, Electronics, Books, etc.
- AMAZON_ORDERS
- table with a record of all Orders (Order ID's, Order Date, Product Condition, Order Status)
Which Database?
Writing SQL is mostly the same across databases. SQL follows standards from ANSI SQL, which is the standard language for relational database management systems, which we learned about earlier.
There are many databases, some that you can host locally on your computer, and others are hosted in the Cloud. There's MySQL, PostgreSQL, Microsoft SQL Server, Oracle, Google Big Query, and Snowflake among others. (I've used all of these databases). Writing SQL across these databases can vary and have different "dialects", while achieving the same result.
Here are some examples of the different dialects, if we wanted to look for users Amazon customers who registered an account within the last 7 days using the REGISTRATION_DATE field in the AMAZON_CUSTOMERS table. You'll see that you can write it different ways depending on the database, the end result is the same!
Microsoft SQL ServerSELECT *
FROM Amazon_Customers
WHERE Registration_Date >= DATEADD(D, -7, GETDATE());
OracleSELECT *
FROM Amazon_Customers
WHERE Registration_Date >= SYSDATE - 7;
Google Big QuerySELECT *
FROM Amazon_Customers
WHERE Registration_Date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY);
SnowflakeSELECT *
FROM Amazon_Customers
WHERE Registration_Date >= DATEADD(DAY, -7, CURRENT_DATE);
Final Thoughts
We journeyed into the benefits of SQL, why you would use a SQL and database over Excel and/or Google Sheets and how to pull data using the SELECT statement. We also learned the basics of how data might be structured a database at Amazon with Customer, Products and Order data. Lastly, we learned that while SQL is mostly the same, there are different dialects depending on the database you're using.
In the next lesson, let's learn more about the SELECT statement.