Study Notes #16
- Entry-Relationship Diagrams (ERD)
- a common way to view data in a database. It is also a key element to understanding how we can pull data from multiple tables.
Tables & Columns
In the Parch & Posey database there are 5 tables:
Primary and Foreign Keys
- Primary Key
- A primary key exists in every table, and it is a column that has a unique value for every row.
- For this database, it is always called
id, but that is not true of all databases.
- Foreign Key
A foreign key is a column in one table that is a primary key in a different table. We can see in the Parch & Posey ERD that the foreign keys are:
Primary – Foreign Key Link
In the above image you can see that:
- The region_id is the foreign key.
- The region_id is linked to id – this is the primary-foreign key link that connects these two tables.
- The crow’s foot shows that the FK can actually appear in many rows in the sales_reps table.
- While the single line is telling us that the PK shows that id appears only once per row in this table.
If you look through the rest of the database, you will notice this is always the case for a primary-foreign key relationship.
- The primary key is a single column that must exist in each table of a database. Again, these rules are true for most major databases, but some databases may not enforce these rules.
- There are one and only one of these columns on every table.
- They are a column in a table.
- Foreign keys are always associated with a primary key, and they are associated with the crow-foot notation above to show they can appear multiple times in a particular table.
SELECT orders.* FROM orders JOIN accounts ON orders.account_id = accounts.id;
Here is the ERD for these two tables:
ERD for Accounts & Orders tables
Notice our SQL query has the two tables we would like to join – one in the FROM and the other in the JOIN. Then in the ON, we will ALWAYs have the PK equal to the FK:
The way we join any two tables is in this way: linking the PK and FK (generally in an ON statement).
ERD for Sales Reps & Region tables
JOIN More than Two Tables
If we wanted to join all three of these tables, we could use the same logic. The code below pulls all of the data from all of the joined tables.
SELECT * FROM web_events JOIN accounts ON web_events.account_id = accounts.id JOIN orders ON accounts.id = orders.account_id
Alternatively, we can create a SELECT statement that could pull specific columns from any of the three tables. Again, our JOIN holds a table, and ON is a link for our PK to equal the FK.
To pull specific columns, the SELECT statement will need to specify the table that you are wishing to pull the column from, as well as the column name. We could pull only three columns in the above by changing the select statement to the below, but maintaining the rest of the JOIN information:
SELECT web_events.channel, accounts.name, orders.total
When we JOIN tables together, it is nice to give each table an alias. Frequently an alias is just the first letter of the table name. You actually saw something similar for column names in the Arithmetic Operators concept.
SELECT o.*, a.* FROM orders o JOIN accounts a ON o.account_id = a.id
If you have two or more columns in your SELECT that have the same name after the table name such as accounts.name and sales_reps.name you will need to alias them. Otherwise it will only show one of the columns. You can alias them like accounts.name AS AcountName, sales_rep.name AS SalesRepName
- Provide a table for all web_events associated with the account name of
Walmart. There should be three columns. Be sure to include the
primary_poc, time of the event, and the
channelfor each event. Additionally, you might choose to add a fourth column to assure only
Walmartevents were chosen.
SELECT w.channel, w.occurred_at, a.primary_poc, a.name FROM web_events w JOIN accounts a ON a.id = w.account_id WHERE a.name = 'Walmart';
- Provide a table that provides the region for each sales_rep along with their associated accounts. Your final table should include three columns: the region name, the sales rep name, and the account name. Sort the accounts alphabetically (A-Z) according to the account name.
SELECT sales_reps.name, region.name, accounts.name FROM region JOIN sales_reps ON region.id = sales_reps.region_id JOIN accounts ON sales_reps.id = accounts.sales_rep_id ORDER BY accounts.sales_rep_id ASC; This is the correct one below: SELECT s.name AS SalesRepName, r.name AS RegionName, a.name AS AccountsName FROM region r JOIN sales_reps s ON r.id = s.region_id JOIN accounts a ON s.id = a.sales_rep_id ORDER BY a.name ASC; Answer Key SELECT r.name region, s.name rep, a.name account FROM sales_reps s JOIN region r ON s.region_id = r.id JOIN accounts a ON a.sales_rep_id = s.id ORDER BY a.name;
I had to use AS to change the column headers, since joining three tables with the same column labels “Name”.
- Provide the name for each region for every order, as well as the account name and the unit price they paid (total_amt_usd/total) for the order. Your final table should have 3 columns: region name, account name, and unit price. A few accounts have 0 for total, so I divided by (total + 0.01) to assure not dividing by zero.
SELECT r.name AS RegionName, a.name AS AccountName, o.total_amt_usd FROM region r JOIN sales_reps s ON r.id = s.region_id JOIN accounts a ON s.id = a.sales_rep_id JOIN orders o ON a.id = o.account_id Answer Key SELECT r.name region, a.name account, o.total_amt_usd/(o.total + 0.01) unit_price FROM region r JOIN sales_reps s ON s.region_id = r.id JOIN accounts a ON a.sales_rep_id = s.id JOIN orders o ON o.account_id = a.id;
This one is tricky – I realized, there’s a map connecting every tables in the database and cannot do a shortcut. I had to include tables like
sales_rep_id in order to map
accounts, I cannot connect
region directly to
- The ON statement should always occur with the foreign key being equal to the primary key.
- JOIN statements allow us to pull data from multiple tables in a SQL database.
- You can use all of the commands we saw in the first lesson along with JOIN statements.
- We can simply write our alias directly after the column name (in the SELECT) or table name (in the FROM or JOIN) by writing the alias directly following the column or table we would like to alias. This will allow you to create clear column names even if calculations are used to create the column, and you can be more efficient with your code by aliasing table names.
You have had a bit of an introduction to these one-to-one and one-to-many relationships when we introduced PKs and FKs. Notice, traditional databases do not allow for many-to-many relationships, as these break the schema down pretty quickly. A very good answer is provided here.
The types of relationships that exist in a database matter less to analysts, but you do need to understand why you would perform different types of JOINs, and what data you are pulling from the database. These ideas will be expanded upon in the next concepts.
Leave a Reply