Study Notes #16

ERD Reminder

  • Entry-Relationship Diagrams (ERD)
  • a common way to view data in a database. It is also a key element to understanding how we can pull data from multiple tables.
Entity Relationship Diagram (ERD) for Parch & Posey Database

Tables & Columns

In the Parch & Posey database there are 5 tables:

  1. web_events
  2. accounts
  3. orders
  4. sales_reps
  5. region

Primary and Foreign Keys

PK

  • Primary Key
  • A primary key exists in every table, and it is a column that has a unique value for every row.
  • For this database, it is always called id, but that is not true of all databases.

FK

  • Foreign Key

foreign key is a column in one table that is a primary key in a different table. We can see in the Parch & Posey ERD that the foreign keys are:

  1. region_id
  2. account_id
  3. sales_rep_id

In the above image you can see that:

  1. The region_id is the foreign key.
  2. The region_id is linked to id – this is the primary-foreign key link that connects these two tables.
  3. The crow’s foot shows that the FK can actually appear in many rows in the sales_reps table.
  4. While the single line is telling us that the PK shows that id appears only once per row in this table.

If you look through the rest of the database, you will notice this is always the case for a primary-foreign key relationship. 

Important:

  1. The primary key is a single column that must exist in each table of a database. Again, these rules are true for most major databases, but some databases may not enforce these rules.
  2. There are one and only one of these columns on every table.
  3. They are a column in a table.
  4. Foreign keys are always associated with a primary key, and they are associated with the crow-foot notation above to show they can appear multiple times in a particular table.
SELECT orders.*
FROM orders
JOIN accounts
ON orders.account_id = accounts.id;

Here is the ERD for these two tables:

ERD for accounts and orders tables

ERD for Accounts & Orders tables

Notice

Notice our SQL query has the two tables we would like to join – one in the FROM and the other in the JOIN. Then in the ON, we will ALWAYs have the PK equal to the FK:

The way we join any two tables is in this way: linking the PK and FK (generally in an ON statement).

ERD for sales reps and region tables

ERD for Sales Reps & Region tables

JOIN More than Two Tables

ERD for Web Events, Accounts & Orders Tables

The Code

If we wanted to join all three of these tables, we could use the same logic. The code below pulls all of the data from all of the joined tables.

SELECT *
FROM web_events
JOIN accounts
ON web_events.account_id = accounts.id
JOIN orders
ON accounts.id = orders.account_id

Alternatively, we can create a SELECT statement that could pull specific columns from any of the three tables. Again, our JOIN holds a table, and ON is a link for our PK to equal the FK.

To pull specific columns, the SELECT statement will need to specify the table that you are wishing to pull the column from, as well as the column name. We could pull only three columns in the above by changing the select statement to the below, but maintaining the rest of the JOIN information:

SELECT web_events.channel, accounts.name, orders.total

Alias

When we JOIN tables together, it is nice to give each table an alias. Frequently an alias is just the first letter of the table name. You actually saw something similar for column names in the Arithmetic Operators concept.

SELECT o.*, a.*
FROM orders o
JOIN accounts a
ON o.account_id = a.id

If you have two or more columns in your SELECT that have the same name after the table name such as accounts.name and sales_reps.name you will need to alias them. Otherwise it will only show one of the columns. You can alias them like accounts.name AS AcountName, sales_rep.name AS SalesRepName

Quiz

  1. Provide a table for all web_events associated with the account name of Walmart. There should be three columns. Be sure to include the primary_poc, time of the event, and the channel for each event. Additionally, you might choose to add a fourth column to assure only Walmart events were chosen.
SELECT w.channel, w.occurred_at, a.primary_poc, a.name
    FROM web_events w
    JOIN accounts a
    ON a.id = w.account_id
    WHERE a.name = 'Walmart';
  1. Provide a table that provides the region for each sales_rep along with their associated accounts. Your final table should include three columns: the region name, the sales rep name, and the account name. Sort the accounts alphabetically (A-Z) according to the account name.
SELECT sales_reps.name, region.name, accounts.name
    FROM region
    JOIN sales_reps
    ON region.id = sales_reps.region_id
    JOIN accounts
    ON sales_reps.id = accounts.sales_rep_id
    ORDER BY accounts.sales_rep_id ASC;

This is the correct one below:

SELECT s.name AS SalesRepName, r.name AS RegionName, a.name AS AccountsName
    FROM region r
    JOIN sales_reps s
    ON r.id = s.region_id
    JOIN accounts a
    ON s.id = a.sales_rep_id
    ORDER BY a.name ASC;

Answer Key

SELECT r.name region, s.name rep, a.name account
FROM sales_reps s
JOIN region r
ON s.region_id = r.id
JOIN accounts a
ON a.sales_rep_id = s.id
ORDER BY a.name;

I had to use AS to change the column headers, since joining three tables with the same column labels “Name”.

  1. Provide the name for each region for every order, as well as the account name and the unit price they paid (total_amt_usd/total) for the order. Your final table should have 3 columns: region nameaccount name, and unit price. A few accounts have 0 for total, so I divided by (total + 0.01) to assure not dividing by zero.
SELECT r.name AS RegionName, a.name AS AccountName, o.total_amt_usd
    FROM region r
    JOIN sales_reps s
    ON r.id = s.region_id
    JOIN accounts a
    ON s.id = a.sales_rep_id
    JOIN orders o
    ON a.id = o.account_id

Answer Key

SELECT r.name region, a.name account, 
 o.total_amt_usd/(o.total + 0.01) unit_price
FROM region r
JOIN sales_reps s
ON s.region_id = r.id
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id;

This one is tricky – I realized, there’s a map connecting every tables in the database and cannot do a shortcut. I had to include tables like sales_rep_id in order to map region to accounts, I cannot connect region directly to accounts.

  • The ON statement should always occur with the foreign key being equal to the primary key.
  • JOIN statements allow us to pull data from multiple tables in a SQL database.
  • You can use all of the commands we saw in the first lesson along with JOIN statements.
  • We can simply write our alias directly after the column name (in the SELECT) or table name (in the FROM or JOIN) by writing the alias directly following the column or table we would like to alias. This will allow you to create clear column names even if calculations are used to create the column, and you can be more efficient with your code by aliasing table names.

Expert Tip

You have had a bit of an introduction to these one-to-one and one-to-many relationships when we introduced PKs and FKs. Notice, traditional databases do not allow for many-to-many relationships, as these break the schema down pretty quickly. A very good answer is provided here.

The types of relationships that exist in a database matter less to analysts, but you do need to understand why you would perform different types of JOINs, and what data you are pulling from the database. These ideas will be expanded upon in the next concepts.

This entry was posted in Study Notes and tagged . Bookmark the permalink.

Leave a Reply