Study Notes #19

I am not sure how I missed this module. 😞

Basic SQL

primary_poc – a column name on the diagram above

web_events – a table name on the diagram above

Database – a collection of tables that share connected data stored in a computer

ERD – a diagram that shows how data is structured in a database

SQL – a language that allows to access data stored in a database

SQL

  • Structured Query Language
  • a language
  • most popular for its interaction with databases. 

There are some major advantages to using traditional relational databases, which we interact with using SQL. The five most apparent are:

  • SQL is easy to understand.
  • Traditional databases allow us to access data directly.
  • Traditional databases allow us to audit and replicate our data.
  • SQL is a great tool for analyzing multiple tables at once.
  • SQL allows you to analyze more complex questions than dashboard tools like Google Analytics.

Why Businesses Like Databases

  1. Data integrity is ensured – only the data you want to be entered is entered, and only certain users are able to enter data into the database.
  2. Data can be accessed quickly – SQL allows you to obtain results very quickly from the data stored in a database. Code can be optimized to quickly pull results.
  3. Data is easily shared – multiple individuals can access data stored in a database, and the data is the same for all users allowing for consistent results for anyone with access to your database.

How Databases Store Data

A few key points about data stored in SQL databases:

  1. Data in databases is stored in tables that can be thought of just like Excel spreadsheets. For the most part, you can think of a database as a bunch of Excel spreadsheets. Each spreadsheet has rows and columns. Where each row holds data on a transaction, a person, a company, etc., while each column holds data pertaining to a particular aspect of one of the rows you care about like a name, location, a unique id, etc.
  2. All the data in the same column must match in terms of data type. An entire column is considered quantitative, discrete, or as some sort of string. This means if you have one row with a string in a particular column, the entire column might change to a text data type. This can be very bad if you want to do math with this column!
  3. Consistent column types are one of the main reasons working with databases is fast. Often databases hold a LOT of data. So, knowing that the columns are all of the same types of data means that obtaining data from a database can still be fast.

Types of Databases

SQL Databases

There are many different types of SQL databases designed for different purposes. In this course, we will use Postgres within the classroom, which is a popular open-source database with a very complete library of analytical functions. (Note: You do not need to install PostgreSQL on your computer unless you really want to. We provide SQL environments in the classroom for you to work in.)

Some of the most popular databases include:

  1. MySQL
  2. Access
  3. Oracle
  4. Microsoft SQL Server
  5. Postgres

You can also write SQL within other programming frameworks like Python, Scala, and Hadoop.

Small Differences

Each of these SQL databases may have subtle differences in syntax and available functions — for example, MySQL doesn’t have some of the functions for modifying dates as Postgres. Most of what you see with Postgres will be directly applicable to using SQL in other frameworks and database environments. For the differences that do exist, you should check the documentation. Most SQL environments have great documentation online that you can easily access with a quick Google search.

The article here compares three of the most common types of SQL: SQLite, PostgreSQL, and MySQL. Again, once you have learned how to write SQL in one environment, the skills are mostly transferable.

Types of Statements

Statements

  • tell the database of what you’d like to do with the data
  • SQL statements are code that can read and manipulate data.
  • SQL isn’t case sensitive – meaning you can write upper and lower case anywhere in the code.
  • you can end SQL statements with a semicolon, but some SQL environments don’t require a semicolon at the end.
  1. CREATE TABLE is a statement that creates a new table in a database.
  2. DROP TABLE is a statement that removes a table in a database.
  3. SELECT allows you to read data and display it. This is called a query.

SQL – SELECT & FROM

  1. SELECT indicates which column(s) you want to be given the data for.
  2. FROM specifies from which table(s) you want to select the columns. Notice the columns need to exist in this table.

If you want to be provided with the data from all columns in the table, you use “*”, like so:

  • SELECT * FROM orders

Note that using SELECT does not create a new table with these columns in the database, it just provides the data to you as the results, or output, of this command.

You will use this SQL SELECT statement in every query in this course, but you will be learning a few additional statements and operators that can be used along with them to ask more advanced questions of your data.

SQL LIMIT

  • useful when you want to see just the first few rows of a table. This can be much faster for loading than if we load the entire dataset.
  • is always the very last part of a query.
/* Limit the number of data shown from the database. */

SELECT occurred_at, account_id, channel
FROM   web_events
LIMIT  15;

SQL ORDER BY

  • sort results using the data in any column.
  • must be added after the FROM query
  • can be ASC(default, no need to enter) or DESC
  • using ORDER BY in a SQL query only has temporary effects, for the results of that query, unlike sorting a sheet by column in Excel or Sheets. (your output will be sorted that way, but then the next query you run will encounter the unsorted data again.)
SELECT *
FROM orders
ORDER BY occurred_at
LIMIT 1000;

Quiz

  1. Write a query to return the 10 earliest orders in the orders table. Include the idoccurred_at, and total_amt_usd.
SELECT id, occurred_at, total_amt_usd
FROM orders
ORDER BY occurred_at DESC
LIMIT 10;

Answer Key:
SELECT id, occurred_at, total_amt_usd
FROM orders
ORDER BY occurred_at
LIMIT 10;

I am correct. 😊 
  1. Write a query to return the top 5 orders in terms of the largest total_amt_usd. Include the idaccount_id, and total_amt_usd.
SELECT id, account_id, total_amt_usd
FROM orders
ORDER BY total_amt_usd DESC
LIMIT 5;

Answer Key:
SELECT id, account_id, total_amt_usd
FROM orders
ORDER BY total_amt_usd DESC 
LIMIT 5;

I am correct. 😊 
  1. Write a query to return the lowest 20 orders in terms of the smallest total_amt_usd. Include the idaccount_id, and total_amt_usd.
SELECT id, account_id, total_amt_usd
FROM orders
ORDER BY total_amt_usd ASC
LIMIT 20;

Answer Key:
SELECT id, account_id, total_amt_usd
FROM orders
ORDER BY total_amt_usd
LIMIT 20;

I am correct, though I forgot that I don't need to put ASC

When you provide a list of columns in an ORDER BY command, the sorting occurs using the leftmost column in your list first, then the next column from the left, and so on. We still have the ability to flip the way we order using DESC.

ORDER By account_id, total_amt_usd DESC

This sorts account_id from smallest to largest first, and from there sorts the total_amt_usd from largest to smallest, per account_id since there are multiple orders per account_id

ORDER By total_amt_usd DESC, account_id

This sorts the total_amt_usd from largest to smallest first, that is regardless of the account_id now, since the total_amt_usd are all unique values.

Quiz

  1. Write a query that displays the order ID, account ID, and total dollar amount for all the orders, sorted first by the account ID (in ascending order), and then by the total dollar amount (in descending order).
SELECT id, account_id, total_amt_usd
FROM orders
ORDER BY account_id, total_amt_usd DESC

Answer Key: 
SELECT id, account_id, total_amt_usd
FROM orders
ORDER BY account_id, total_amt_usd DESC;

I am correct. 😊 
  1. Now write a query that again displays order ID, account ID, and total dollar amount for each order, but this time sorted first by total dollar amount (in descending order), and then by account ID (in ascending order).
SELECT id, account_id, total_amt_usd
FROM orders
ORDER BY total_amt_usd DESC, account_id;

Answer Key: 
SELECT id, account_id, total_amt_usd
FROM orders
ORDER BY total_amt_usd DESC, account_id;

I am correct. 😊 
  1. Compare the results of these two queries above. How are the results different when you switch the column you sort on first?

The first results return a table sorted by Account ID, then per Account ID sorted by Total Dollar Amount.

The second results return a table sorted by the Total Dollar Amount.

From the module:

In query #1, all of the orders for each account ID are grouped together, and then within each of those groupings, the orders appear from the greatest order amount to the least. In query #2, since you sorted by the total dollar amount first, the orders appear from greatest to least regardless of which account ID they were from. Then they are sorted by account ID next. (The secondary sorting by account ID is difficult to see here since only if there were two orders with equal total dollar amounts would there need to be any sorting by account ID.)

SQL WHERE

  • filter a set of results based on specific criteria /  filtering the data.
  • goes after FROM but before ORDER BY or LIMIT
  • display subsets of tables based on conditions that must be met. 

Common symbols used in WHERE statements include:

  1. > (greater than)
  2. < (less than)
  3. >= (greater than or equal to)
  4. <= (less than or equal to)
  5. = (equal to)
  6. != (not equal to)
/* This query would display 1000 most oldest orders with account ID 4251, sroted by occurred_at */

SELECT *
FROM orders
WHERE account_id = 4251
ORDER BY occurred_at
LIMIT 1000;

Quiz

  1. Pulls the first 5 rows and all columns from the orders table that have a dollar amount of gloss_amt_usd greater than or equal to 1000.
SELECT *
FROM orders
WHERE gloss_amt_usd >= 1000
LIMIT 5;

Answer Key:
SELECT *
FROM orders
WHERE gloss_amt_usd >= 1000
LIMIT 5;

I am correct. 😊 
  1. Pulls the first 10 rows and all columns from the orders table that have a total_amt_usd less than 500.
SELECT *
FROM orders
WHERE total_amt_usd < 500
LIMIT 10;

Answer Key: 
SELECT *
FROM orders
WHERE total_amt_usd < 500
LIMIT 10;

I am correct. 😊 

Quick Note:

we do not need to ORDER BY unless we want to actually order our data. Our condition will work without having to do any sorting of the data.

WHERE with non Non-Numeric Data

  • We can use the = and != operators here. You need to be sure to use single quotes (just be careful if you have quotes in the original text) with the text data, not double quotes.
  • we use the LIKENOT, or IN operators.
/* Filter the accounts table to include the company name, website, and the primary point of contact (primary_poc) just for the Exxon Mobil company in the accounts table */

SELECT name, website, primary_poc
FROM accounts 
WHERE name = 'Exxon Mobil';

Arithmetic Operators

Play Video

Derived Columns

Creating a new column that is a combination of existing columns is known as a derived column (or “calculated” or “computed” column). Usually, you want to give a name, or “alias,” to your new column using the AS keyword.

This derived column, and its alias, are generally only temporary, existing just for the duration of your query. The next time you run a query and access this table, the new column will not be there.

If you are deriving the new column from existing columns using a mathematical expression, then these familiar mathematical operators will be useful:

  1. * (Multiplication)
  2. + (Addition)
  3. - (Subtraction)
  4. / (Division)

Consider this example:

SELECT id, (standard_amt_usd/total_amt_usd)*100 AS std_percent, total_amt_usd
FROM orders
LIMIT 10;

Here we divide the standard paper dollar amount by the total order amount to find the standard paper percent for the order, and use the AS keyword to name this new column “std_percent.” You can run this query on the next page if you’d like, to see the output.

Order of Operations

Remember PEMDAS from math class to help remember the order of operations? If not, check out this link as a reminder. The same order of operations applies when using arithmetic operators in SQL.

The following two statements have very different end results:

  1. Standard_qty / standard_qty + gloss_qty + poster_qty
  2. standard_qty / (standard_qty + gloss_qty + poster_qty)

It is likely that you mean to do the calculation as written in statement number 2!

Code from the Video

SELECT account_id,
       occurred_at,
       standard_qty,
       gloss_qty + poster_qty AS nonstandard_qty
FROM orders

Arithmetic Operators

Derived Columns

  • a new column that is a manipulation of the existing columns in the database
  • can include simple arithmetic or any number of advanced calculations
  • use alias AS to set a column name

Creating a new column that is a combination of existing columns is known as a derived column (or “calculated” or “computed” column). Usually, you want to give a name, or “alias,” to your new column using the AS keyword.

This derived column, and its alias, are generally only temporary, existing just for the duration of your query. The next time you run a query and access this table, the new column will not be there.

If you are deriving the new column from existing columns using a mathematical expression, then these familiar mathematical operators will be useful:

  1. * (Multiplication)
  2. + (Addition)
  3. - (Subtraction)
  4. / (Division)
/* divide the standard paper dollar amount by the total order amount to find the standard paper percent for the order, and use the AS keyword to name this new column "std_percent."  */

SELECT id, (standard_amt_usd/total_amt_usd)*100 AS std_percent, total_amt_usd
FROM orders
LIMIT 10;

Order of Operations

PEMDAS – Parentheses, Exponents, Multiplication and Division, and Addition and Subtraction

1. Standard_qty / standard_qty + gloss_qty + poster_qty
2. standard_qty / (standard_qty + gloss_qty + poster_qty)

(the second will return the correct results)

Quiz

  1. Create a column that divides the standard_amt_usd by the standard_qty to find the unit price for standard paper for each order. Limit the results to the first 10 orders, and include the id and account_id fields.
SELECT standard_amt_usd/standard_qty AS standard_unit_price, id, account_id
FROM orders
LIMIT 10;

Answer Key:
SELECT id, account_id, standard_amt_usd/standard_qty AS unit_price
FROM orders
LIMIT 10;

I am correct. 😊 
  1. Write a query that finds the percentage of revenue that comes from poster paper for each order. You will need to use only the columns that end with _usd. (Try to do this without using the total column.) Display the id and account_id fields also. NOTE – you will receive an error with the correct solution to this question. This occurs because at least one of the values in the data creates a division by zero in your formula. There are ways to better handle this. For now, you can just limit your calculations to the first 10 orders, as we did in question #1, and you’ll avoid that set of data that causes the problem.
SELECT (poster_amt_usd/total_amt_usd)*100 AS poster_percentage, id, account_id
FROM orders
LIMIT 10;

Answer Key:
SELECT id, account_id, 
poster_amt_usd/(standard_amt_usd + gloss_amt_usd + poster_amt_usd) AS post_per
FROM orders
LIMIT 10;

OR

SELECT id, account_id, 
poster_amt_usd/(standard_amt_usd + gloss_amt_usd + poster_amt_usd) * 100 AS post_per
LIMIT 10;

I missed the formula of the percentage of poster here, I took it from the total instead of getting the sum of all paper types. 

Logical Operators

  1. LIKE This allows you to perform operations similar to using WHERE and =, but for cases when you might not know exactly what you are looking for.
  2. IN This allows you to perform operations similar to using WHERE and =, but for more than one condition.
  3. NOT This is used with IN and LIKE to select all of the rows NOT LIKE or NOT IN a certain condition.
  4. AND & BETWEEN These allow you to combine operations where all combined conditions must be true.
  5. OR This allows you to combine operations where at least one of the combined conditions must be true.

SQL LIKE

  • allows you to perform operations similar to using WHERE and =, but for cases when you might not know exactly what you are looking for.
  • requires the use of wildcards (% i.e. %google%)
/* Returns data for website with "google" in the URL */
SELECT *
FROM accounts
WHERE website LIKE '%google%';

The LIKE operator is extremely useful for working with text. You will use LIKE within a WHERE clause. The LIKE operator is frequently used with %. The % tells us that we might want any number of characters leading up to a particular set of characters or following a certain set of characters, as we saw with the google syntax above. Remember you will need to use single quotes for the text you pass to the LIKE operator because these lower and uppercase letters are not the same within the string. Searching for ‘T’ is not the same as searching for ‘t’. In other SQL environments (outside the classroom), you can use either single or double-quotes.

Quiz

  1. All the companies whose names start with ‘C’.
SELECT name
FROM accounts
WHERE name LIKE 'C%';

Answer Key:
SELECT name
FROM accounts
WHERE name LIKE 'C%';
  1. All companies whose names contain the string ‘one’ somewhere in the name.
SELECT name
FROM accounts
WHERE name LIKE '%one%';

Answer Key:
SELECT name
FROM accounts
WHERE name LIKE '%one%';
  1. All companies whose names end with ‘s’.
SELECT name
FROM accounts
WHERE name LIKE '%s';

Answer Key:
SELECT name
FROM accounts
WHERE name LIKE '%s';

SQL IN

  • allows to filter data based on several possible values
  • works both numeric and text columns.
  • allows to use an =, but for more than one item of that particular column. 

Expert Tip

In most SQL environments, although not in our Udacity’s classroom, you can use single or double quotation marks – and you may NEED to use double quotation marks if you have an apostrophe within the text you are attempting to pull.

In our Udacity SQL workspaces, note you can include an apostrophe by putting two single quotes together. For example, Macy’s in our workspace would be ‘Macy”s’.

/* Returns orders for account IDs 1001 and 1021 */
SELECT *
FROM orders
WHERE account_id IN (1001,1021);

Quiz

  1. Use the accounts table to find the account nameprimary_poc, and sales_rep_id for Walmart, Target, and Nordstrom.
SELECT name, primary_poc, sales_rep_id
FROM accounts
WHERE name IN ('Walmart', 'Target', 'Nordstrom');

Answer Key:
SELECT name, primary_poc, sales_rep_id
FROM accounts
WHERE name IN ('Walmart', 'Target', 'Nordstrom');

I am correct. 
  1. Use the web_events table to find all information regarding individuals who were contacted via the channel of organic or adwords.
SELECT *
FROM web_events
WHERE channel IN ('organic', 'adwords');

Answer Key: 
SELECT *
FROM web_events
WHERE channel IN ('organic', 'adwords');

I am correct. 

SQL NOT

  • provides the inverse results for IN, LIKE and SIMILAR operators.

The NOT operator is an extremely useful operator for working with the previous two operators we introduced: IN and LIKE. By specifying NOT LIKE or NOT IN, we can grab all of the rows that do not meet particular criteria.

/* Returns sales_rep_id EXCEPT for 321500,321570 */
SELECT sales_rep_id, 
       name
FROM accounts
WHERE sales_rep_id NOT IN (321500,321570)
ORDER BY sales_rep_id

Quiz

  1. Use the accounts table to find the account name, primary poc, and sales rep id for all stores except Walmart, Target, and Nordstrom.
SELECT name, primary_poc, sales_rep_id
FROM accounts
WHERE name NOT IN('Walmart', 'Target', 'Nordstrom');
  1. Use the web_events table to find all information regarding individuals who were contacted via any method except using organic or adwords methods.
SELECT *
FROM web_events
WHERE channel NOT IN('organic', 'adwords');
  1. All the companies whose names do not start with ‘C’.
SELECT name
FROM accounts
WHERE name NOT LIKE 'C%';
  1. All companies whose names do not contain the string ‘one’ somewhere in the name.
SELECT name
FROM accounts
WHERE name NOT LIKE '%one%';
  1. All companies whose names do not end with ‘s’.
SELECT name
FROM accounts
WHERE name NOT LIKE '%s';

SQL AND and BETWEEN

  • filter based on multiple criteria using AND.

The AND operator is used within a WHERE statement to consider more than one logical clause at a time. Each time you link a new statement with an AND, you will need to specify the column you are interested in looking at. You may link as many statements as you would like to consider at the same time. This operator works with all of the operations we have seen so far including arithmetic operators (+*-/). LIKEIN, and NOT logic can also be linked together using the AND operator.

SQL BETWEEN

Sometimes we can make a cleaner statement using BETWEEN than we can use AND. Particularly this is true when we are using the same column for different parts of our AND statement.

Instead of writing :

WHERE column >= 6 AND column <= 10

we can instead write, equivalently:

WHERE column BETWEEN 6 AND 10
SELECT *
FROM orders
WHERE occurred_at >= '2016-04-01' AND occurred_at <= '2016-10-01'
ORDER BY occurred_at
SELECT *
FROM orders
WHERE occurred_at BETWEEN '2016-04-01' AND '2016-10-01'
ORDER BY occurred_at
This entry was posted in Study Notes and tagged . Bookmark the permalink.

Leave a Reply