Study Notes #22

SQL Subqueries & Temporary Tables

Subquery

a query within a query
a query has both SELECT and FROM clauses to signify what you want to extract from a table and what table you’d like to pull data from. A query that includes subquery, as a result, has multiple SELECT and FROM clauses.
The subquery that sits nested inside a larger query is called an INNER QUERY. This inner query can be fully executed on its own and often is run independently before when trying to troubleshoot bugs in your code.

SELECT product_id,
       name,
       price
FROM db.product
Where price > (SELECT AVG(price)
              FROM db.product)

Typically, you need to use a subquery when you have the need to manipulate an existing table to “pseudo-create” a table that is then used as a part of a larger query.

When do you need to use a subquery?

You need to use a subquery when you have the need to manipulate an existing table to “pseudo-create” a table that is then used as a part of a larger query. In the examples below, existing tables cannot be joined together to solve the problem at hand. Instead, an existing table needs to be manipulated, massaged, or aggregated in some way to then join to another table in the dataset to answer the posed question.

Set of Problems:

Identify the top-selling Amazon products in months where sales have exceeded $1m
- Existing Table: Amazon daily sales
- Subquery Aggregation: Daily to Monthly
Examine the average price of a brand’s products for the highest-grossing brands
- Existing Table: Product pricing data across all retailers
- Subquery Aggregation: Individual to Average
Order the annual salary of employees that are working less than 150 hours a month
- Existing Table: Daily time-table of employees
- Subquery Aggregation: Daily to Monthly

SELECT product_id,
       name,
       price
FROM db.product
Where price > (SELECT AVG(price)
              FROM db.product)

Subqueries vs JOINs

Often times, a problem can be solved using either a subquery or a join. The user considers the tradeoffs and determines the path forward.

Differences between Subqueries and Joins

Use Cases:

Subquery: When an existing table needs to be manipulated or aggregated to then be joined to a larger table.

Joins: A fully flexible and discretionary use case where a user wants to bring two or more tables together and select and filter as needed.

Syntax:

Subquery: A subquery is a query within a query. The syntax, as a result, has multiple SELECT and FROM clauses.

Joins: A join is simple stitching together multiple tables with a common key or column. A join clause cannot stand and be run independently.

Dependencies:

Subquery: A subquery clause can be run completely independently. When trying to debug code, subqueries are often run independently to pressure test results before running the larger query.

Joins: A join clause cannot stand and be run independently.

Similarities between Subqueries and Joins

Output:

Both subqueries and joins are essentially bringing multiple tables together (whether an existing table is first manipulated or not) to generate a single output.

Deep-dive topics:

What happens under the hood: Query plans are similar for both subqueries and joins. You can read more about how query plans are here.

Subquery vs Joins Overview

Components	Subquery	JOINS
Combine data from multiple tables into a single result	X	X
Create a flexible view of tables stitched together using a “key”		X
Build an output to use in a later part of the query	X
Subquery Plan: What happens under the hood	X	X

Query 1

SELECT product_id,
       name,
       price
FROM db.product
Where price > (SELECT AVG(price)
              FROM db.product)

Query 2

SELECT a.brand_id,
       a.total_brand_sales
       AVG(b.product_price)
FROM brand_table a
JOIN brand_table b
ON b.brand_id = a.brand_id
GROUP BY a.brand_id, a.total_brand_sales
ORDER BY a.total_brand_sales desc;

Subqueries and Joins Deep-dives

Subqueries:

Output: Either a scalar (a single value) or rows that have met a condition. Use Case: Calculate a scalar value to use in a later part of the query (e.g., average price as a filter). Dependencies: Stand independently and be run as complete queries themselves.

Joins:

Output: A joint view of multiple tables stitched together using a common “key”. Use Case: Fully stitch tables together and have full flexibility on what to “select” and “filter from”. Dependencies: Cannot stand independently.

Subquery Basics

must be fully placed inside parenthesis
must be fully independent and can be executed on its own
have two components to consider:
- where its placed
- dependencies

A caveat with subqueries being independent:

In almost all cases, subqueries are fully independent. They are “interim”/temp tables that can be fully executed on their own. However, there is an exception. When a subquery, typically in the form of a nested or inline subquery, is correlated to its outer query, it cannot run independently. This is most certainly an edge case since correlated subqueries are rarely implemented compared to standalone, simple subqueries.

Placement:

There are four places where subqueries can be inserted within a larger query:

With
Nested
Inline
Scalar

Dependencies:

A subquery can be dependent on the outer query or independent of the outer query.

Resources:

One of my favorite resources on subqueries that covers use cases, syntax, and examples is from Microsoft and can be found here.

Subqueries: Placement

The problem you are trying to solve and the readability of the query determine where the subquery is placed.

The key concept of placement is where exactly the subquery is placed within the context of the larger query. There are four different places where a subquery can be inserted. From my experience, the decision of which placement to leverage stems from (1) the problem at hand and (2) the readability of the query.

Subquery Placement:

With: This subquery is used when you’d like to “pseudo-create” a table from an existing table and visually scope the temporary table at the top of the larger query.

Nested: This subquery is used when you’d like the temporary table to act as a filter within the larger query, which implies that it often sits within the where clause.

Inline: This subquery is used in the same fashion as the WITH use case above. However, instead of the temporary table sitting on top of the larger query, it’s embedded within the from clause.

Scalar: This subquery is used when you’d like to generate a scalar value to be used as a benchmark of some sort.

For example, when you’d like to calculate the average salary across an entire organization to compare to individual employee salaries. Because it’s often a single value that is generated and used as a benchmark, the scalar subquery often sits within the select clause.

Advantages:

Readability: With and Nested subqueries are most advantageous for readability.

Performance: Scalar subqueries are advantageous for performance and are often used on smaller datasets.

SELECT channel,
       AVG(event_count) AS avg_event_count
FROM
(SELECT DATE_TRUNC('day',occurred_at) AS day,
        channel,
        count(*) as event_count
   FROM web_events
   GROUP BY 1,2
   ) sub
   GROUP BY 1
   ORDER BY 2 DESC

Build the Subquery: The aggregation of an existing table that you’d like to leverage as a part of the larger query.
Run the Subquery: Because a subquery can stand independently, it’s important to run its content first to get a sense of whether this aggregation is the interim output you are expecting.
Encapsulate and Name: Close this subquery off with parentheses and call it something. In this case, we called the subquery table ‘sub.’
Test Again: Run a SELECT * within the larger query to determine if all syntax of the subquery is good to go.
Build Outer Query: Develop the SELECT * clause as you see fit to solve the problem at hand, leveraging the subquery appropriately.

Quiz

Find the number of events that occur for each day for each channel.

SELECT COUNT(*) AS event_count, 
       DATE_TRUNC('day', occurred_at),
       channel
  FROM web_events
GROUP BY DATE_TRUNC('day', occurred_at),
	   channel
ORDER BY COUNT(*) DESC;

Answer Key

SELECT DATE_TRUNC('day',occurred_at) AS day,
       channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2
ORDER BY 3 DESC;

I need to learn how to use 1, 2, and so on, this represents the number of columns as set on SELECT query.

On which day-channel pair did the most events occur?

event_count	date_trunc	channel
21	2017-01-01T00:00:00.000Z	direct
21	2016-12-21T00:00:00.000Z	direct

Now create a subquery that simply provides all of the data from your first query.

SELECT channel, Day, event_count
FROM   (SELECT COUNT(*) AS event_count, 
       DATE_TRUNC('day', occurred_at) AS Day,
       channel
  FROM web_events
GROUP BY DATE_TRUNC('day', occurred_at),
	   channel) sub
ORDER BY 3 DESC;

Answer Key:

SELECT *
FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
                channel, COUNT(*) as events
          FROM web_events 
          GROUP BY 1,2
          ORDER BY 3 DESC) sub;

I need to learn how to simplify my code.

The original query goes in the FROM statement. An * is used in the SELECT statement to pull all of the data from the original query. You must use an alias for the table you nest within the outer query.
Now find the average number of events for each channel. Since you broke out by a day earlier, this is giving you an average per day.

SELECT channel, AVG(event_count) AS avg_event_count
FROM   (SELECT COUNT(*) AS event_count, 
       DATE_TRUNC('day', occurred_at) AS Day,
       channel
  FROM web_events
GROUP BY DATE_TRUNC('day', occurred_at),
	   channel) sub
GROUP BY 1;

channel	avg_event_count
adwords	1.5701906412478336
organic	1.6672504378283713
twitter	1.3166666666666667
direct	4.8964879852125693
facebook	1.5983471074380165
banner	1.2899728997289973

Answer Key:

SELECT channel, AVG(events) AS average_events
FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
                channel, COUNT(*) as events
         FROM web_events 
         GROUP BY 1,2) sub
GROUP BY channel
ORDER BY 2 DESC;

I get the same results, but what happened here?

SELECT – we got similar queries here.

FROM – we’ve got similar queries here but in different order.

I need to learn/practice simplifying my GROUP BY query with column numbers.

Placement WITH

statement “scoped” technique to improve the structure of the subquery
When a user wants to create a version of an existing table to be used in a larger query (e.g., aggregate daily prices to an average price table).
It is advantageous for readability purposes.

CTE – stands for Common Table Expression. A Common Table Expression in SQL allows you to define a temporary result, such as a table, to then be referenced in a later part of the query.

QUESTION: You need to find the average number of events for each channel per day.

SOLUTION:

SELECT channel, AVG(events) AS average_events
FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
             channel, COUNT(*) as events
      FROM web_events 
      GROUP BY 1,2) sub
GROUP BY channel
ORDER BY 2 DESC;

Let’s try this again using a WITH statement.

Notice, you can pull the inner query:

SELECT DATE_TRUNC('day',occurred_at) AS day, 
       channel, COUNT(*) as events
FROM web_events 
GROUP BY 1,2

This is the part we put in the WITH statement. Notice, we are aliasing the table as events below:

WITH events AS (
          SELECT DATE_TRUNC('day',occurred_at) AS day, 
                        channel, COUNT(*) as events
          FROM web_events 
          GROUP BY 1,2)

Now, we can use this newly created events table as if it is any other table in our database:

WITH events AS (
          SELECT DATE_TRUNC('day',occurred_at) AS day, 
                        channel, COUNT(*) as events
          FROM web_events 
          GROUP BY 1,2)

SELECT channel, AVG(events) AS average_events
FROM events
GROUP BY channel
ORDER BY 2 DESC;

For the above example, we don’t need anymore than the one additional table, but imagine we needed to create a second table to pull from. We can create an additional table to pull from in the following way:

WITH table1 AS (
          SELECT *
          FROM web_events),

     table2 AS (
          SELECT *
          FROM accounts)


SELECT *
FROM table1
JOIN table2
ON table1.account_id = table2.id;

You can add more and more tables using the WITH statement in the same way.

Quiz

Provide the name of the sales_rep in each region with the largest amount of total_amt_usd sales.
- a table of sales rep per region with total_amt_usd each
- a table that calculates the MAX total_amt_usd per region
- a final table that filters the sales rep, the region and the highest total amt usd

I am correct on my approach here, I only messed up with the code.

Attempt 1*

WITH Region AS (SELECT r.name Region, s.name SalesRep
FROM region r
JOIN sales_reps s
ON r.id = s.region_id),
AverageSales AS (SELECT o.total_amt_usd, s.name SalesRep
FROM sales_reps s
JOIN accounts a
ON s.id = a.sales_rep_id
JOIN orders o
ON a.id = o.account_id)
SELECT Region, s.name, AverageSales
FROM Region, AverageSales, sales_reps s
GROUP BY Region, s.name, averagesales;

Attempt 2

a table of sales rep per region with total_amt_usd each

SELECT r.name Region, s.name SalesRep, SUM(o.total_amt_usd)
FROM region r
JOIN sales_reps s
ON r.id = s.region_id
JOIN accounts a
ON s.id = a.sales_rep_id
JOIN orders o
ON a.id = o.account_id
GROUP BY r.name, s.name

a table that calculates the MAX total_amt_usd per region

I AM LOST!!!

Answer Key:

WITH t1 AS (
  SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
   FROM sales_reps s
   JOIN accounts a
   ON a.sales_rep_id = s.id
   JOIN orders o
   ON o.account_id = a.id
   JOIN region r
   ON r.id = s.region_id
   GROUP BY 1,2
   ORDER BY 3 DESC), 
t2 AS (
   SELECT region_name, MAX(total_amt) total_amt
   FROM t1
   GROUP BY 1)
SELECT t1.rep_name, t1.region_name, t1.total_amt
FROM t1
JOIN t2
ON t1.region_name = t2.region_name AND t1.total_amt = t2.total_amt;

What happened here???!!!

Watched tutorials online:

I will try to explain what happened there, based on my understanding.

Step 1 – It takes the total of sales per sales rep name per region. No multiple Sales Rep name, but with multiple region.

SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
   FROM sales_reps s
   JOIN accounts a
   ON a.sales_rep_id = s.id
   JOIN orders o
   ON o.account_id = a.id
   JOIN region r
   ON r.id = s.region_id
   GROUP BY 1,2
   ORDER BY 3 DESC

STEP 2 – Then it adds to a temporary table with the query WITH, receiving a name t1.

WITH t1 AS (
  SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
   FROM sales_reps s
   JOIN accounts a
   ON a.sales_rep_id = s.id
   JOIN orders o
   ON o.account_id = a.id
   JOIN region r
   ON r.id = s.region_id
   GROUP BY 1,2
   ORDER BY 3 DESC)

Step 3 – It calculates the MAX total sales from t1 and grouped it by region name.

WITH t1 AS (
  SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
   FROM sales_reps s
   JOIN accounts a
   ON a.sales_rep_id = s.id
   JOIN orders o
   ON o.account_id = a.id
   JOIN region r
   ON r.id = s.region_id
   GROUP BY 1,2
   ORDER BY 3 DESC)
 SELECT region_name, MAX(total_amt) total_amt
   FROM t1
   GROUP BY 1

region_name	total_amt
Midwest	675637.19
Southeast	1098137.72
Northeast	1010690.60
West	886244.12

Step 4 – Created a second temporary table for the MAX value of total sales, with a name t2.

WITH t1 AS (
  SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
   FROM sales_reps s
   JOIN accounts a
   ON a.sales_rep_id = s.id
   JOIN orders o
   ON o.account_id = a.id
   JOIN region r
   ON r.id = s.region_id
   GROUP BY 1,2
   ORDER BY 3 DESC), 
t2 AS (
   SELECT region_name, MAX(total_amt) total_amt
   FROM t1
   GROUP BY 1)

Step 5 – Combined t1 and t2 in the final table.

See how if only t1

WITH t1 AS (
  SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
   FROM sales_reps s
   JOIN accounts a
   ON a.sales_rep_id = s.id
   JOIN orders o
   ON o.account_id = a.id
   JOIN region r
   ON r.id = s.region_id
   GROUP BY 1,2
   ORDER BY 3 DESC), 
t2 AS (
   SELECT region_name, MAX(total_amt) total_amt
   FROM t1
   GROUP BY 1)
SELECT t1.rep_name, t1.region_name, t1.total_amt
FROM t1

rep_name	region_name	total_amt
Earlie Schleusner	Southeast	1098137.72
Tia Amato	Northeast	1010690.60
Vernita Plump	Southeast	934212.93
Georgianna Chisholm	West	886244.12
Arica Stoltzfus	West	810353.34
Dorotha Seawell	Southeast	766935.04
Nelle Meaux	Southeast	749076.16

then joined t2, assigning t1.region_name, t1.total_amt column based on data from t2

WITH t1 AS (
  SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
   FROM sales_reps s
   JOIN accounts a
   ON a.sales_rep_id = s.id
   JOIN orders o
   ON o.account_id = a.id
   JOIN region r
   ON r.id = s.region_id
   GROUP BY 1,2
   ORDER BY 3 DESC), 
t2 AS (
   SELECT region_name, MAX(total_amt) total_amt
   FROM t1
   GROUP BY 1)
SELECT t1.rep_name, t1.region_name, t1.total_amt
FROM t1
JOIN t2
ON t1.region_name = t2.region_name AND t1.total_amt = t2.total_amt;

rep_name	region_name	total_amt
Earlie Schleusner	Southeast	1098137.72
Tia Amato	Northeast	1010690.60
Georgianna Chisholm	West	886244.12
Charles Bidwell	Midwest	675637.19

But I think I have understood this better, without the

t1.region_name = t2.region_name AND

This query doesn’t seem to be necessary, as we’re only filtering out the Max for each region based on the data from t2.

WITH t1 AS (
  SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
   FROM sales_reps s
   JOIN accounts a
   ON a.sales_rep_id = s.id
   JOIN orders o
   ON o.account_id = a.id
   JOIN region r
   ON r.id = s.region_id
   GROUP BY 1,2
   ORDER BY 3 DESC), 
t2 AS (
   SELECT region_name, MAX(total_amt) total_amt
   FROM t1
   GROUP BY 1)
SELECT t1.rep_name, t1.region_name, t1.total_amt
FROM t1
JOIN t2
ON t1.total_amt = t2.total_amt;

rep_name	region_name	total_amt
Earlie Schleusner	Southeast	1098137.72
Tia Amato	Northeast	1010690.60
Georgianna Chisholm	West	886244.12
Charles Bidwell	Midwest	675637.19

I think I was confused with the JOIN query used here, as I have learned it with the keys. Keys are not existing for temporary tables in CTE, so it was assigned freely with CTEs?

Let me try the next question…

For the region with the largest sales total_amt_usd, how many total orders were placed?
- I am looking for the MAX amount of sale per region.
- And for that region with MAX amount of sale in USD all across, how many orders were placed?

I finally get it!!! 😊 

WITH t1 AS (SELECT r.name Region, SUM(o.total_amt_usd) total_usd, COUNT(o.total) total_order
FROM region r
JOIN sales_reps s
ON r.id = s.region_id
JOIN accounts a
ON s.id = a.sales_rep_id
JOIN orders o
ON a.id = o.account_id
GROUP BY 1),
t2 AS (SELECT MAX(t1.total_usd) total_usd
FROM t1)
SELECT t1.Region, t1.total_order, t1.total_usd
FROM t1
JOIN t2
ON t1.total_usd = t2.total_usd;

region	total_order	total_usd
Northeast	2357	7744405.36

There seems to be something wrong with the Solutions – reported.

How many accounts had more total purchases than the account name which has bought the most standard_qty paper throughout their lifetime as a customer?

Step 1 – get all the data for the account name, count of total orders, and count of standard_qty

SELECT a.name AcctName, COUNT(o.total) Total, COUNT(standard_qty) Standard_Total
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY a.name;

acctname	total	standard_total
Boeing	15	15
Western Digital	65	65
Sysco	68	68
Southern	20	20
Altria Group	51	51
Energy Transfer Equity	6	6
Aramark	8	8

Step 2 Get MAX of total standard qty orders

WITH t1 AS (SELECT a.name AcctName, COUNT(o.total) Total, COUNT(standard_qty) Standard_Total
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY a.name)
SELECT MAX(t1.Standard_Total) StandardTotal
FROM t1

standardtotal
71

This is my final answer:

WITH t1 AS (SELECT a.name AcctName, COUNT(o.total) Total, COUNT(standard_qty) Standard_Total
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY a.name),
t2 AS (SELECT MAX(t1.Standard_Total) StandardTotal
FROM t1)
SELECT t1.AcctName
FROM t1, t2
WHERE t1.Total > t2.StandardTotal
GROUP BY t1.AcctName

This returns 0.

So I checked the data:

SELECT a.name AcctName, COUNT(o.total) Total, COUNT(standard_qty) Standard_Total
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY a.name
ORDER BY 2 DESC;

acctname	total	standard_total
Leucadia National	71	71
Sysco	68	68
Supervalu	68	68
Arrow Electronics	67	67
General Dynamics	66	66
Mosaic	66	66
Archer Daniels Midland	66	66
Western Digital	65	65
United States Steel	65	65
Philip Morris International	65	65
Fluor	65	65
United Continental Holdings	63	63
Marathon Petroleum	62	62

Back to the question – How many accounts had more total purchases than the account name which has bought the most standard_qty paper throughout their lifetime as a customer?

The max standard_qty paper purchased is 71 by Leucadia National, while the max number of order placed by an account is also 71 and with the same account.

Did I do something wrong?

Let me break this down.

I checked the total order placed by the accounts, and I get the same result – 71.

SELECT a.name AcctName, COUNT(o.total) Total
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY a.name
ORDER BY 2 DESC;

acctname	total
Leucadia National	71
Sysco	68
Supervalu	68
Arrow Electronics	67
General Dynamics	66
Mosaic	66
Archer Daniels Midland	66

I also checked the total standard qty per account, and the max is still 71 by the same company.

SELECT a.name AcctName, COUNT(standard_qty) Standard_Total
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY a.name
ORDER BY 2 DESC;

acctname	standard_total
Leucadia National	71
Sysco	68
Supervalu	68
Arrow Electronics	67
General Dynamics	66

I am checking the Solutions –

WITH t1 AS (
  SELECT a.name account_name, SUM(o.standard_qty) total_std, SUM(o.total) total
  FROM accounts a
  JOIN orders o
  ON o.account_id = a.id
  GROUP BY 1
  ORDER BY 2 DESC
  LIMIT 1), 
t2 AS (
  SELECT a.name
  FROM orders o
  JOIN accounts a
  ON a.id = o.account_id
  GROUP BY 1
  HAVING SUM(o.total) > (SELECT total FROM t1))
SELECT COUNT(*)
FROM t2;

count
3

I am almost correct!! Except I should have used SUM instead of COUNT. 😞

BUT.. I checked again…

WITH t1 AS (SELECT a.name AcctName, SUM(o.total) Total, SUM(standard_qty) Standard_Total
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY a.name),
t2 AS (SELECT MAX(t1.Standard_Total) StandardTotal
FROM t1)
SELECT t1.AcctName, t2.StandardTotal Max_Standard, t1.Total Total_Orders
FROM t1, t2
WHERE t1.Total > t2.StandardTotal
GROUP BY t1.AcctName, t2.StandardTotal, t1.Total

acctname	standardtotal	total
Leucadia National	41617	42358
Core-Mark Holding	41617	44750
EOG Resources	41617	56410
IBM	41617	47506
General Dynamics	41617	43730
Mosaic	41617	49246

The MAX standard qty is 41617 and there are 6 companies that placed more orders in total than that.

For the customer that spent the most (in total over their lifetime as a customer) total_amt_usd, how many web_events did they have for each channel?

STEP 1 – Get all the data across all columns needed

SELECT a.name, w.channel, SUM(o.total_amt_usd), COUNT(w.id)
FROM accounts a
JOIN web_events w
ON a.id = w.account_id
JOIN orders o
on a.id = o.account_id
GROUP BY 1, 2
ORDER BY 3 DESC;

name	channel	sum	count
EOG Resources	direct	16846425.20	2728
Mosaic	direct	16244073.73	3102
Leucadia National	direct	15134457.00	3692
IBM	direct	13726418.16	2520
General Dynamics	direct	13230570.76	2904

From here, I know that EOG Resources is the biggest spender among all accounts.

Step 2 – Get the MAX spent and put in the second table

WITH t1 AS (SELECT a.name AcctName, w.channel Channel, SUM(o.total_amt_usd) Total_Spent, COUNT(w.id) Channel_Count
FROM accounts a
JOIN web_events w
ON a.id = w.account_id
JOIN orders o
on a.id = o.account_id
GROUP BY 1, 2), 
t2 AS (SELECT MAX(t1.total_spent) Total_Spent
FROM t1)
SELECT t1.AcctName, t1.Channel_Count, t1.Channel
FROM t1
JOIN t2
ON t1.Total_Spent = t2.Total_Spent

Again!!

Step 1 – Get the max spender

SELECT a.name AcctName, SUM(o.total_amt_usd) Total_Spent
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY 1
ORDER BY 2 DESC;

acctname	total_spent
EOG Resources	382873.30
Mosaic	345618.59
IBM	326819.48
General Dynamics	300694.79
Republic Services	293861.14
Leucadia National	291047.25
Arrow Electronics	281018.36

I get the same account, but different amount 382873.30 WHY???!!

Step 2 – Get the max amount

WITH t1 AS (SELECT a.name AcctName, SUM(o.total_amt_usd) Total_Spent
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY 1
ORDER BY 2 DESC)
SELECT MAX(t1.Total_Spent)
FROM t1

max
382873.30

Same information with the MAX query

STEP 3 – Get the channel counts for each account

WITH t1 AS (SELECT a.name AcctName, SUM(o.total_amt_usd) Total_Spent
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY 1),
t2 AS (SELECT MAX(t1.Total_Spent) Total_Spent
FROM t1)
SELECT t1.AcctName AcctName, w.channel Channel, COUNT(w.channel) Channel_Count
FROM t1, web_events w
GROUP BY t1.AcctName, w.channel

acctname	channel	channel_count
Oneok	adwords	906
Whole Foods Market	organic	952
TIAA	adwords	906
Whirlpool	adwords	906
Time Warner Cable	adwords	906
Gilead Sciences	twitter	474
Kohl’s	banner	476

I finally got it!!!!!

WITH t1 AS (SELECT a.name AcctName, SUM(o.total_amt_usd) Total_Spent
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY 1),
t2 AS (SELECT MAX(t1.Total_Spent) Total_Spent
FROM t1),
t3 AS (SELECT t1.AcctName AcctName, w.channel Channel, COUNT(w.channel) Channel_Count
FROM t1, web_events w
GROUP BY t1.AcctName, w.channel)
SELECT t1.AcctName, t3.Channel, t3.Channel_Count
FROM t1, t2, t3
WHERE t1.Total_Spent = t2.Total_Spent
GROUP BY 1, 2, 3

acctname	channel	channel_count
EOG Resources	facebook	967
EOG Resources	twitter	474
EOG Resources	banner	476
EOG Resources	organic	952
EOG Resources	adwords	906
EOG Resources	direct	5298

What is the lifetime average amount spent in terms of total_amt_usd for the top 10 total spending accounts?

STEP 1 – Get the top 10 spenders

SELECT a.name, SUM(o.total_amt_usd)
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10;

name	sum
EOG Resources	382873.30
Mosaic	345618.59
IBM	326819.48
General Dynamics	300694.79
Republic Services	293861.14
Leucadia National	291047.25
Arrow Electronics	281018.36
Sysco	278575.64
Supervalu	275288.30
Archer Daniels Midland	272672.84

WITH t1 AS (SELECT a.name AcctName, SUM(o.total_amt_usd) Total_Spent
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10)
SELECT AVG(t1.Total_Spent) TotalSpent
FROM t1

avg_totalspent
304846.969000000000

I GOT THIS RIGHT!!!

What is the lifetime average amount spent in terms of total_amt_usd, including only the companies that spent more per order, on average, than the average of all orders.
- get average spent per company
- get average spent in total – 3348.0196513310185185
- compare a and b
- filter those who spends more in average than the average in total
- get their average

WITH t1 AS (SELECT AVG(o.total_amt_usd) Over_All_Total
FROM accounts a
JOIN orders o
ON a.id = o.account_id),
t2 AS (SELECT a.name, o.total_amt_usd Total_USD
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY 1, 2
HAVING AVG(o.total_amt_usd) > (SELECT Over_All_Total FROM t1)
ORDER BY 2 DESC)
SELECT AVG(t2.total_usd)
FROM t2

avg
7917.6384593190998269

Answer Key:

WITH t1 AS (
   SELECT AVG(o.total_amt_usd) avg_all
   FROM orders o
   JOIN accounts a
   ON a.id = o.account_id),
t2 AS (
   SELECT o.account_id, AVG(o.total_amt_usd) avg_amt
   FROM orders o
   GROUP BY 1
   HAVING AVG(o.total_amt_usd) > (SELECT * FROM t1))
SELECT AVG(avg_amt)
FROM t2;

This also doesn’t match. 😞 I will check this again later, but I will have to move on for now.

rose learns.to('code');

Study Notes #22

SQL Subqueries & Temporary Tables

Subquery

When do you need to use a subquery?

Subqueries vs JOINs

Differences between Subqueries and Joins

Use Cases:

Syntax:

Dependencies:

Similarities between Subqueries and Joins

Output:

Deep-dive topics:

Subquery vs Joins Overview

Subqueries and Joins Deep-dives

Subqueries:

Joins:

Subquery Basics

Placement:

Dependencies:

Resources:

Subqueries: Placement

Subquery Placement:

Advantages:

Quiz

Placement WITH

Quiz

Leave a ReplyCancel reply