Study Notes #15

Relational Database

  • tables within a database relate to one another
  • they contain identifiers to make the content easily combined
SELECT *
FROM orders

Why Would We Want to Split Data Into Separate Tables?

  1. Different contents of the tables, though they relate to another
  2. Speed in updating data

Database Normalization

When creating a database, it is really important to think about how data will be stored. This is known as normalization, and it is a huge part of most SQL classes. If you are in charge of setting up a new database, it is important to have a thorough understanding of database normalization.

There are essentially three ideas that are aimed at database normalization:

  1. Are the tables storing logical groupings of the data?
  2. Can I make changes in a single location, rather than in many tables for the same information?
  3. Can I access and manipulate data quickly and efficiently?

This is discussed in detail here.

However, most analysts are working with a database that was already set up with the necessary properties in place. As analysts of data, you don’t really need to think too much about data normalization. You just need to be able to pull the data from the database, so you can start making insights. This will be our focus in this lesson.

SQL Joins

JOINS

  • tells a query an additional table from which you would like to pull data.
  • useful for allowing us to pull data from multiple tables. This is both simple and powerful all at the same time.
JOIN demo.accounts (table name)

ON

  • specifies a logical statement to combine the table in FROM and JOIN statements
ON orders.account_id * accounts.id

The SELECT clause indicates which column(s) of data you’d like to see in the output (For Example, orders.* gives us all the columns in the orders table in the output). The FROM clause indicates the first table from which we’re pulling data, and the JOIN indicates the second table. The ON clause specifies the column on which you’d like to merge the two tables together.

SELECT orders.*
FROM orders
JOIN accounts
ON orders.account_id = accounts.id;

We are able to pull data from two tables:

  1. orders
  2. accounts

Above, we are only pulling data from the orders table since in the SELECT statement we only reference columns from the orders table.

The ON statement holds the two columns that get linked across the two tables. This will be the focus of the next concepts.

If we wanted to only pull individual elements from either the orders or accounts table, we can do this by using the exact same information in the FROM and ON statements. However, in your SELECT statement, you will need to know how to specify tables and columns in the SELECT statement:

  1. The table name is always before the period.
  2. The column you want from that table is always after the period.
If we want to pull only the account name and the dates in which that account placed an order, but none of the other columns, we can do this with the following query:

SELECT accounts.name, orders.occurred_at
FROM orders
JOIN accounts
ON orders.account_id = accounts.id;

This is how I understand this:

SELECT – I select the table by specifying the name. If I want to display only a specific column, I will add a . after the table name. Therefore, accounts.website means displaying the website column from the accounts table. Since this is a JOIN command, I entered two sets for SELECT SELECT accounts.website, orders.total_amt_usd it means, display website column from the accounts table with their matching total_amt_usd rows from the orders table. This is displayed by order on how I entered the code. In my example, I entered website first, so that is the first column displayed.

FROM and JOIN – the two tables to look at or get the information from, this can be interchanged.

ON – the column that exists on both tables, identifier.

orders.account_id = accounts.id;

This code means that the identifier is the ID, it’s found as account_id on orders table, and id column on the accounts table.

Sample Quiz:

  1. Try pulling all the data from the accounts table, and all the data from the orders table.
  2. Try pulling standard_qtygloss_qty, and poster_qty from the orders table, and the website and the primary_poc from the accounts table.
SELECT orders.*, accounts.*
FROM accounts
JOIN orders
ON accounts.id = orders.account_id;
SELECT orders.standard_qty, orders.gloss_qty, orders.poster_qty, accounts.website,  accounts.primary_poc 
FROM orders
JOIN accounts
ON orders.account_id = accounts.id;
This entry was posted in Study Notes and tagged . Bookmark the permalink.

Leave a Reply