Study Notes #22

SQL Subqueries & Temporary Tables

Subquery

  • a query within a query
  • a query has both SELECT and FROM clauses to signify what you want to extract from a table and what table you’d like to pull data from. A query that includes subquery, as a result, has multiple SELECT and FROM clauses.
  • The subquery that sits nested inside a larger query is called an INNER QUERY. This inner query can be fully executed on its own and often is run independently before when trying to troubleshoot bugs in your code.
SELECT product_id,
       name,
       price
FROM db.product
Where price > (SELECT AVG(price)
              FROM db.product)
  • Typically, you need to use a subquery when you have the need to manipulate an existing table to “pseudo-create” a table that is then used as a part of a larger query.

When do you need to use a subquery?

You need to use a subquery when you have the need to manipulate an existing table to “pseudo-create” a table that is then used as a part of a larger query. In the examples below, existing tables cannot be joined together to solve the problem at hand. Instead, an existing table needs to be manipulated, massaged, or aggregated in some way to then join to another table in the dataset to answer the posed question.

Set of Problems:

  1. Identify the top-selling Amazon products in months where sales have exceeded $1m
    • Existing Table: Amazon daily sales
    • Subquery Aggregation: Daily to Monthly
  2. Examine the average price of a brand’s products for the highest-grossing brands
    • Existing Table: Product pricing data across all retailers
    • Subquery Aggregation: Individual to Average
  3. Order the annual salary of employees that are working less than 150 hours a month
    • Existing Table: Daily time-table of employees
    • Subquery Aggregation: Daily to Monthly
SELECT product_id,
       name,
       price
FROM db.product
Where price > (SELECT AVG(price)
              FROM db.product)

Subqueries vs JOINs

Often times, a problem can be solved using either a subquery or a join. The user considers the tradeoffs and determines the path forward.

Differences between Subqueries and Joins

Use Cases:

Subquery: When an existing table needs to be manipulated or aggregated to then be joined to a larger table.

Joins: A fully flexible and discretionary use case where a user wants to bring two or more tables together and select and filter as needed.

Syntax:

Subquery: A subquery is a query within a query. The syntax, as a result, has multiple SELECT and FROM clauses.

Joins: A join is simple stitching together multiple tables with a common key or column. A join clause cannot stand and be run independently.

Dependencies:

Subquery: A subquery clause can be run completely independently. When trying to debug code, subqueries are often run independently to pressure test results before running the larger query.

Joins: A join clause cannot stand and be run independently.

Similarities between Subqueries and Joins

Output:

Both subqueries and joins are essentially bringing multiple tables together (whether an existing table is first manipulated or not) to generate a single output.

Deep-dive topics:

What happens under the hood: Query plans are similar for both subqueries and joins. You can read more about how query plans are here.

Subquery vs Joins Overview

ComponentsSubqueryJOINS
Combine data from multiple tables into a single resultXX
Create a flexible view of tables stitched together using a “key”X
Build an output to use in a later part of the queryX
Subquery Plan: What happens under the hoodXX

Query 1

SELECT product_id,
       name,
       price
FROM db.product
Where price > (SELECT AVG(price)
              FROM db.product)

Query 2

SELECT a.brand_id,
       a.total_brand_sales
       AVG(b.product_price)
FROM brand_table a
JOIN brand_table b
ON b.brand_id = a.brand_id
GROUP BY a.brand_id, a.total_brand_sales
ORDER BY a.total_brand_sales desc;

Subqueries and Joins Deep-dives

Subqueries:

Output: Either a scalar (a single value) or rows that have met a condition. Use Case: Calculate a scalar value to use in a later part of the query (e.g., average price as a filter). Dependencies: Stand independently and be run as complete queries themselves.

Joins:

Output: A joint view of multiple tables stitched together using a common “key”. Use Case: Fully stitch tables together and have full flexibility on what to “select” and “filter from”. Dependencies: Cannot stand independently.

Subquery Basics

  • must be fully placed inside parenthesis
  • must be fully independent and can be executed on its own
  • have two components to consider:
    • where its placed
    • dependencies

A caveat with subqueries being independent:

In almost all cases, subqueries are fully independent. They are “interim”/temp tables that can be fully executed on their own. However, there is an exception. When a subquery, typically in the form of a nested or inline subquery, is correlated to its outer query, it cannot run independently. This is most certainly an edge case since correlated subqueries are rarely implemented compared to standalone, simple subqueries.

Placement:

There are four places where subqueries can be inserted within a larger query:

  • With
  • Nested
  • Inline
  • Scalar

Dependencies:

A subquery can be dependent on the outer query or independent of the outer query.

Resources:

One of my favorite resources on subqueries that covers use cases, syntax, and examples is from Microsoft and can be found here.

Subqueries: Placement

The problem you are trying to solve and the readability of the query determine where the subquery is placed.

The key concept of placement is where exactly the subquery is placed within the context of the larger query. There are four different places where a subquery can be inserted. From my experience, the decision of which placement to leverage stems from (1) the problem at hand and (2) the readability of the query.

Subquery Placement:

With: This subquery is used when you’d like to “pseudo-create” a table from an existing table and visually scope the temporary table at the top of the larger query.

Nested: This subquery is used when you’d like the temporary table to act as a filter within the larger query, which implies that it often sits within the where clause.

Inline: This subquery is used in the same fashion as the WITH use case above. However, instead of the temporary table sitting on top of the larger query, it’s embedded within the from clause.

Scalar: This subquery is used when you’d like to generate a scalar value to be used as a benchmark of some sort.

For example, when you’d like to calculate the average salary across an entire organization to compare to individual employee salaries. Because it’s often a single value that is generated and used as a benchmark, the scalar subquery often sits within the select clause.

Advantages:

Readability: With and Nested subqueries are most advantageous for readability.

Performance: Scalar subqueries are advantageous for performance and are often used on smaller datasets.

SELECT channel,
       AVG(event_count) AS avg_event_count
FROM
(SELECT DATE_TRUNC('day',occurred_at) AS day,
        channel,
        count(*) as event_count
   FROM web_events
   GROUP BY 1,2
   ) sub
   GROUP BY 1
   ORDER BY 2 DESC
  1. Build the Subquery: The aggregation of an existing table that you’d like to leverage as a part of the larger query.
  2. Run the Subquery: Because a subquery can stand independently, it’s important to run its content first to get a sense of whether this aggregation is the interim output you are expecting.
  3. Encapsulate and Name: Close this subquery off with parentheses and call it something. In this case, we called the subquery table ‘sub.’
  4. Test Again: Run a SELECT * within the larger query to determine if all syntax of the subquery is good to go.
  5. Build Outer Query: Develop the SELECT * clause as you see fit to solve the problem at hand, leveraging the subquery appropriately.

Quiz

  1. Find the number of events that occur for each day for each channel.
SELECT COUNT(*) AS event_count, 
       DATE_TRUNC('day', occurred_at),
       channel
  FROM web_events
GROUP BY DATE_TRUNC('day', occurred_at),
	   channel
ORDER BY COUNT(*) DESC;

Answer Key

SELECT DATE_TRUNC('day',occurred_at) AS day,
       channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2
ORDER BY 3 DESC;

I need to learn how to use 1, 2, and so on, this represents the number of columns as set on SELECT query.

  1. On which day-channel pair did the most events occur?
event_countdate_truncchannel
212017-01-01T00:00:00.000Zdirect
212016-12-21T00:00:00.000Zdirect
  1. Now create a subquery that simply provides all of the data from your first query.
SELECT channel, Day, event_count
FROM   (SELECT COUNT(*) AS event_count, 
       DATE_TRUNC('day', occurred_at) AS Day,
       channel
  FROM web_events
GROUP BY DATE_TRUNC('day', occurred_at),
	   channel) sub
ORDER BY 3 DESC;

Answer Key:

SELECT *
FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
                channel, COUNT(*) as events
          FROM web_events 
          GROUP BY 1,2
          ORDER BY 3 DESC) sub;

I need to learn how to simplify my code.

  1. The original query goes in the FROM statement. An * is used in the SELECT statement to pull all of the data from the original query. You must use an alias for the table you nest within the outer query.
  2. Now find the average number of events for each channel. Since you broke out by a day earlier, this is giving you an average per day.
SELECT channel, AVG(event_count) AS avg_event_count
FROM   (SELECT COUNT(*) AS event_count, 
       DATE_TRUNC('day', occurred_at) AS Day,
       channel
  FROM web_events
GROUP BY DATE_TRUNC('day', occurred_at),
	   channel) sub
GROUP BY 1;
channelavg_event_count
adwords1.5701906412478336
organic1.6672504378283713
twitter1.3166666666666667
direct4.8964879852125693
facebook1.5983471074380165
banner1.2899728997289973

Answer Key:

SELECT channel, AVG(events) AS average_events
FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
                channel, COUNT(*) as events
         FROM web_events 
         GROUP BY 1,2) sub
GROUP BY channel
ORDER BY 2 DESC;

I get the same results, but what happened here?

SELECT – we got similar queries here.

FROM – we’ve got similar queries here but in different order.

I need to learn/practice simplifying my GROUP BY query with column numbers.

Placement WITH

  • statement “scoped” technique to improve the structure of the subquery
  • When a user wants to create a version of an existing table to be used in a larger query (e.g., aggregate daily prices to an average price table).
  • It is advantageous for readability purposes.

CTE – stands for Common Table Expression. A Common Table Expression in SQL allows you to define a temporary result, such as a table, to then be referenced in a later part of the query.

QUESTION: You need to find the average number of events for each channel per day.

SOLUTION:

SELECT channel, AVG(events) AS average_events
FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
             channel, COUNT(*) as events
      FROM web_events 
      GROUP BY 1,2) sub
GROUP BY channel
ORDER BY 2 DESC;

Let’s try this again using a WITH statement.

Notice, you can pull the inner query:

SELECT DATE_TRUNC('day',occurred_at) AS day, 
       channel, COUNT(*) as events
FROM web_events 
GROUP BY 1,2

This is the part we put in the WITH statement. Notice, we are aliasing the table as events below:

WITH events AS (
          SELECT DATE_TRUNC('day',occurred_at) AS day, 
                        channel, COUNT(*) as events
          FROM web_events 
          GROUP BY 1,2)

Now, we can use this newly created events table as if it is any other table in our database:

WITH events AS (
          SELECT DATE_TRUNC('day',occurred_at) AS day, 
                        channel, COUNT(*) as events
          FROM web_events 
          GROUP BY 1,2)

SELECT channel, AVG(events) AS average_events
FROM events
GROUP BY channel
ORDER BY 2 DESC;

For the above example, we don’t need anymore than the one additional table, but imagine we needed to create a second table to pull from. We can create an additional table to pull from in the following way:

WITH table1 AS (
          SELECT *
          FROM web_events),

     table2 AS (
          SELECT *
          FROM accounts)


SELECT *
FROM table1
JOIN table2
ON table1.account_id = table2.id;

You can add more and more tables using the WITH statement in the same way.

Quiz

  1. Provide the name of the sales_rep in each region with the largest amount of total_amt_usd sales.
    • a table of sales rep per region with total_amt_usd each
    • a table that calculates the MAX total_amt_usd per region
    • a final table that filters the sales rep, the region and the highest total amt usd

I am correct on my approach here, I only messed up with the code.

Attempt 1*

WITH Region AS (SELECT r.name Region, s.name SalesRep
FROM region r
JOIN sales_reps s
ON r.id = s.region_id),
AverageSales AS (SELECT o.total_amt_usd, s.name SalesRep
FROM sales_reps s
JOIN accounts a
ON s.id = a.sales_rep_id
JOIN orders o
ON a.id = o.account_id)
SELECT Region, s.name, AverageSales
FROM Region, AverageSales, sales_reps s
GROUP BY Region, s.name, averagesales;

Attempt 2

  • a table of sales rep per region with total_amt_usd each
SELECT r.name Region, s.name SalesRep, SUM(o.total_amt_usd)
FROM region r
JOIN sales_reps s
ON r.id = s.region_id
JOIN accounts a
ON s.id = a.sales_rep_id
JOIN orders o
ON a.id = o.account_id
GROUP BY r.name, s.name
  • a table that calculates the MAX total_amt_usd per region

I AM LOST!!!

Answer Key:

WITH t1 AS (
  SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
   FROM sales_reps s
   JOIN accounts a
   ON a.sales_rep_id = s.id
   JOIN orders o
   ON o.account_id = a.id
   JOIN region r
   ON r.id = s.region_id
   GROUP BY 1,2
   ORDER BY 3 DESC), 
t2 AS (
   SELECT region_name, MAX(total_amt) total_amt
   FROM t1
   GROUP BY 1)
SELECT t1.rep_name, t1.region_name, t1.total_amt
FROM t1
JOIN t2
ON t1.region_name = t2.region_name AND t1.total_amt = t2.total_amt;

What happened here???!!!

Watched tutorials online:

I will try to explain what happened there, based on my understanding.

Step 1 – It takes the total of sales per sales rep name per region. No multiple Sales Rep name, but with multiple region.

SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
   FROM sales_reps s
   JOIN accounts a
   ON a.sales_rep_id = s.id
   JOIN orders o
   ON o.account_id = a.id
   JOIN region r
   ON r.id = s.region_id
   GROUP BY 1,2
   ORDER BY 3 DESC

STEP 2 – Then it adds to a temporary table with the query WITH, receiving a name t1.

WITH t1 AS (
  SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
   FROM sales_reps s
   JOIN accounts a
   ON a.sales_rep_id = s.id
   JOIN orders o
   ON o.account_id = a.id
   JOIN region r
   ON r.id = s.region_id
   GROUP BY 1,2
   ORDER BY 3 DESC)

Step 3 – It calculates the MAX total sales from t1 and grouped it by region name.

WITH t1 AS (
  SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
   FROM sales_reps s
   JOIN accounts a
   ON a.sales_rep_id = s.id
   JOIN orders o
   ON o.account_id = a.id
   JOIN region r
   ON r.id = s.region_id
   GROUP BY 1,2
   ORDER BY 3 DESC)
 SELECT region_name, MAX(total_amt) total_amt
   FROM t1
   GROUP BY 1
region_nametotal_amt
Midwest675637.19
Southeast1098137.72
Northeast1010690.60
West886244.12

Step 4 – Created a second temporary table for the MAX value of total sales, with a name t2.

WITH t1 AS (
  SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
   FROM sales_reps s
   JOIN accounts a
   ON a.sales_rep_id = s.id
   JOIN orders o
   ON o.account_id = a.id
   JOIN region r
   ON r.id = s.region_id
   GROUP BY 1,2
   ORDER BY 3 DESC), 
t2 AS (
   SELECT region_name, MAX(total_amt) total_amt
   FROM t1
   GROUP BY 1)

Step 5 – Combined t1 and t2 in the final table.

See how if only t1

WITH t1 AS (
  SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
   FROM sales_reps s
   JOIN accounts a
   ON a.sales_rep_id = s.id
   JOIN orders o
   ON o.account_id = a.id
   JOIN region r
   ON r.id = s.region_id
   GROUP BY 1,2
   ORDER BY 3 DESC), 
t2 AS (
   SELECT region_name, MAX(total_amt) total_amt
   FROM t1
   GROUP BY 1)
SELECT t1.rep_name, t1.region_name, t1.total_amt
FROM t1
rep_nameregion_nametotal_amt
Earlie SchleusnerSoutheast1098137.72
Tia AmatoNortheast1010690.60
Vernita PlumpSoutheast934212.93
Georgianna ChisholmWest886244.12
Arica StoltzfusWest810353.34
Dorotha SeawellSoutheast766935.04
Nelle MeauxSoutheast749076.16

then joined t2, assigning t1.region_name, t1.total_amt column based on data from t2

WITH t1 AS (
  SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
   FROM sales_reps s
   JOIN accounts a
   ON a.sales_rep_id = s.id
   JOIN orders o
   ON o.account_id = a.id
   JOIN region r
   ON r.id = s.region_id
   GROUP BY 1,2
   ORDER BY 3 DESC), 
t2 AS (
   SELECT region_name, MAX(total_amt) total_amt
   FROM t1
   GROUP BY 1)
SELECT t1.rep_name, t1.region_name, t1.total_amt
FROM t1
JOIN t2
ON t1.region_name = t2.region_name AND t1.total_amt = t2.total_amt;
rep_nameregion_nametotal_amt
Earlie SchleusnerSoutheast1098137.72
Tia AmatoNortheast1010690.60
Georgianna ChisholmWest886244.12
Charles BidwellMidwest675637.19

But I think I have understood this better, without the

t1.region_name = t2.region_name AND 

This query doesn’t seem to be necessary, as we’re only filtering out the Max for each region based on the data from t2.

WITH t1 AS (
  SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
   FROM sales_reps s
   JOIN accounts a
   ON a.sales_rep_id = s.id
   JOIN orders o
   ON o.account_id = a.id
   JOIN region r
   ON r.id = s.region_id
   GROUP BY 1,2
   ORDER BY 3 DESC), 
t2 AS (
   SELECT region_name, MAX(total_amt) total_amt
   FROM t1
   GROUP BY 1)
SELECT t1.rep_name, t1.region_name, t1.total_amt
FROM t1
JOIN t2
ON t1.total_amt = t2.total_amt;
rep_nameregion_nametotal_amt
Earlie SchleusnerSoutheast1098137.72
Tia AmatoNortheast1010690.60
Georgianna ChisholmWest886244.12
Charles BidwellMidwest675637.19

I think I was confused with the JOIN query used here, as I have learned it with the keys. Keys are not existing for temporary tables in CTE, so it was assigned freely with CTEs?

Let me try the next question…

  1. For the region with the largest sales total_amt_usd, how many total orders were placed?
    • I am looking for the MAX amount of sale per region.
    • And for that region with MAX amount of sale in USD all across, how many orders were placed?
I finally get it!!! 😊 

WITH t1 AS (SELECT r.name Region, SUM(o.total_amt_usd) total_usd, COUNT(o.total) total_order
FROM region r
JOIN sales_reps s
ON r.id = s.region_id
JOIN accounts a
ON s.id = a.sales_rep_id
JOIN orders o
ON a.id = o.account_id
GROUP BY 1),
t2 AS (SELECT MAX(t1.total_usd) total_usd
FROM t1)
SELECT t1.Region, t1.total_order, t1.total_usd
FROM t1
JOIN t2
ON t1.total_usd = t2.total_usd;
regiontotal_ordertotal_usd
Northeast23577744405.36

There seems to be something wrong with the Solutions – reported.

  1. How many accounts had more total purchases than the account name which has bought the most standard_qty paper throughout their lifetime as a customer?

Step 1 – get all the data for the account name, count of total orders, and count of standard_qty

SELECT a.name AcctName, COUNT(o.total) Total, COUNT(standard_qty) Standard_Total
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY a.name;
acctnametotalstandard_total
Boeing1515
Western Digital6565
Sysco6868
Southern2020
Altria Group5151
Energy Transfer Equity66
Aramark88

Step 2 Get MAX of total standard qty orders

WITH t1 AS (SELECT a.name AcctName, COUNT(o.total) Total, COUNT(standard_qty) Standard_Total
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY a.name)
SELECT MAX(t1.Standard_Total) StandardTotal
FROM t1
standardtotal
71

This is my final answer:

WITH t1 AS (SELECT a.name AcctName, COUNT(o.total) Total, COUNT(standard_qty) Standard_Total
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY a.name),
t2 AS (SELECT MAX(t1.Standard_Total) StandardTotal
FROM t1)
SELECT t1.AcctName
FROM t1, t2
WHERE t1.Total > t2.StandardTotal
GROUP BY t1.AcctName

This returns 0.

So I checked the data:

SELECT a.name AcctName, COUNT(o.total) Total, COUNT(standard_qty) Standard_Total
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY a.name
ORDER BY 2 DESC;
acctnametotalstandard_total
Leucadia National7171
Sysco6868
Supervalu6868
Arrow Electronics6767
General Dynamics6666
Mosaic6666
Archer Daniels Midland6666
Western Digital6565
United States Steel6565
Philip Morris International6565
Fluor6565
United Continental Holdings6363
Marathon Petroleum6262

Back to the question – How many accounts had more total purchases than the account name which has bought the most standard_qty paper throughout their lifetime as a customer?

The max standard_qty paper purchased is 71 by Leucadia National, while the max number of order placed by an account is also 71 and with the same account.

Did I do something wrong?

Let me break this down.

I checked the total order placed by the accounts, and I get the same result – 71.

SELECT a.name AcctName, COUNT(o.total) Total
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY a.name
ORDER BY 2 DESC;
acctnametotal
Leucadia National71
Sysco68
Supervalu68
Arrow Electronics67
General Dynamics66
Mosaic66
Archer Daniels Midland66

I also checked the total standard qty per account, and the max is still 71 by the same company.

SELECT a.name AcctName, COUNT(standard_qty) Standard_Total
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY a.name
ORDER BY 2 DESC;
acctnamestandard_total
Leucadia National71
Sysco68
Supervalu68
Arrow Electronics67
General Dynamics66

I am checking the Solutions –

WITH t1 AS (
  SELECT a.name account_name, SUM(o.standard_qty) total_std, SUM(o.total) total
  FROM accounts a
  JOIN orders o
  ON o.account_id = a.id
  GROUP BY 1
  ORDER BY 2 DESC
  LIMIT 1), 
t2 AS (
  SELECT a.name
  FROM orders o
  JOIN accounts a
  ON a.id = o.account_id
  GROUP BY 1
  HAVING SUM(o.total) > (SELECT total FROM t1))
SELECT COUNT(*)
FROM t2;
count
3

I am almost correct!! Except I should have used SUM instead of COUNT. 😞

BUT.. I checked again…

WITH t1 AS (SELECT a.name AcctName, SUM(o.total) Total, SUM(standard_qty) Standard_Total
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY a.name),
t2 AS (SELECT MAX(t1.Standard_Total) StandardTotal
FROM t1)
SELECT t1.AcctName, t2.StandardTotal Max_Standard, t1.Total Total_Orders
FROM t1, t2
WHERE t1.Total > t2.StandardTotal
GROUP BY t1.AcctName, t2.StandardTotal, t1.Total
acctnamestandardtotaltotal
Leucadia National4161742358
Core-Mark Holding4161744750
EOG Resources4161756410
IBM4161747506
General Dynamics4161743730
Mosaic4161749246

The MAX standard qty is 41617 and there are 6 companies that placed more orders in total than that.

  1. For the customer that spent the most (in total over their lifetime as a customer) total_amt_usd, how many web_events did they have for each channel?

STEP 1 – Get all the data across all columns needed

SELECT a.name, w.channel, SUM(o.total_amt_usd), COUNT(w.id)
FROM accounts a
JOIN web_events w
ON a.id = w.account_id
JOIN orders o
on a.id = o.account_id
GROUP BY 1, 2
ORDER BY 3 DESC;
namechannelsumcount
EOG Resourcesdirect16846425.202728
Mosaicdirect16244073.733102
Leucadia Nationaldirect15134457.003692
IBMdirect13726418.162520
General Dynamicsdirect13230570.762904

From here, I know that EOG Resources is the biggest spender among all accounts.

Step 2 – Get the MAX spent and put in the second table

WITH t1 AS (SELECT a.name AcctName, w.channel Channel, SUM(o.total_amt_usd) Total_Spent, COUNT(w.id) Channel_Count
FROM accounts a
JOIN web_events w
ON a.id = w.account_id
JOIN orders o
on a.id = o.account_id
GROUP BY 1, 2), 
t2 AS (SELECT MAX(t1.total_spent) Total_Spent
FROM t1)
SELECT t1.AcctName, t1.Channel_Count, t1.Channel
FROM t1
JOIN t2
ON t1.Total_Spent = t2.Total_Spent

Again!!

Step 1 – Get the max spender

SELECT a.name AcctName, SUM(o.total_amt_usd) Total_Spent
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY 1
ORDER BY 2 DESC;
acctnametotal_spent
EOG Resources382873.30
Mosaic345618.59
IBM326819.48
General Dynamics300694.79
Republic Services293861.14
Leucadia National291047.25
Arrow Electronics281018.36

I get the same account, but different amount 382873.30 WHY???!!

Step 2 – Get the max amount

WITH t1 AS (SELECT a.name AcctName, SUM(o.total_amt_usd) Total_Spent
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY 1
ORDER BY 2 DESC)
SELECT MAX(t1.Total_Spent)
FROM t1
max
382873.30

Same information with the MAX query

STEP 3 – Get the channel counts for each account

WITH t1 AS (SELECT a.name AcctName, SUM(o.total_amt_usd) Total_Spent
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY 1),
t2 AS (SELECT MAX(t1.Total_Spent) Total_Spent
FROM t1)
SELECT t1.AcctName AcctName, w.channel Channel, COUNT(w.channel) Channel_Count
FROM t1, web_events w
GROUP BY t1.AcctName, w.channel
acctnamechannelchannel_count
Oneokadwords906
Whole Foods Marketorganic952
TIAAadwords906
Whirlpooladwords906
Time Warner Cableadwords906
Gilead Sciencestwitter474
Kohl’sbanner476

I finally got it!!!!!

WITH t1 AS (SELECT a.name AcctName, SUM(o.total_amt_usd) Total_Spent
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY 1),
t2 AS (SELECT MAX(t1.Total_Spent) Total_Spent
FROM t1),
t3 AS (SELECT t1.AcctName AcctName, w.channel Channel, COUNT(w.channel) Channel_Count
FROM t1, web_events w
GROUP BY t1.AcctName, w.channel)
SELECT t1.AcctName, t3.Channel, t3.Channel_Count
FROM t1, t2, t3
WHERE t1.Total_Spent = t2.Total_Spent
GROUP BY 1, 2, 3
acctnamechannelchannel_count
EOG Resourcesfacebook967
EOG Resourcestwitter474
EOG Resourcesbanner476
EOG Resourcesorganic952
EOG Resourcesadwords906
EOG Resourcesdirect5298
  1. What is the lifetime average amount spent in terms of total_amt_usd for the top 10 total spending accounts?

STEP 1 – Get the top 10 spenders

SELECT a.name, SUM(o.total_amt_usd)
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10;
namesum
EOG Resources382873.30
Mosaic345618.59
IBM326819.48
General Dynamics300694.79
Republic Services293861.14
Leucadia National291047.25
Arrow Electronics281018.36
Sysco278575.64
Supervalu275288.30
Archer Daniels Midland272672.84
WITH t1 AS (SELECT a.name AcctName, SUM(o.total_amt_usd) Total_Spent
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10)
SELECT AVG(t1.Total_Spent) TotalSpent
FROM t1
avg_totalspent
304846.969000000000

I GOT THIS RIGHT!!!

  1. What is the lifetime average amount spent in terms of total_amt_usd, including only the companies that spent more per order, on average, than the average of all orders.
    • get average spent per company
    • get average spent in total – 3348.0196513310185185
    • compare a and b
    • filter those who spends more in average than the average in total
    • get their average
WITH t1 AS (SELECT AVG(o.total_amt_usd) Over_All_Total
FROM accounts a
JOIN orders o
ON a.id = o.account_id),
t2 AS (SELECT a.name, o.total_amt_usd Total_USD
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY 1, 2
HAVING AVG(o.total_amt_usd) > (SELECT Over_All_Total FROM t1)
ORDER BY 2 DESC)
SELECT AVG(t2.total_usd)
FROM t2
avg
7917.6384593190998269

Answer Key:

WITH t1 AS (
   SELECT AVG(o.total_amt_usd) avg_all
   FROM orders o
   JOIN accounts a
   ON a.id = o.account_id),
t2 AS (
   SELECT o.account_id, AVG(o.total_amt_usd) avg_amt
   FROM orders o
   GROUP BY 1
   HAVING AVG(o.total_amt_usd) > (SELECT * FROM t1))
SELECT AVG(avg_amt)
FROM t2;

This also doesn’t match. 😞 I will check this again later, but I will have to move on for now.

This entry was posted in Study Notes and tagged . Bookmark the permalink.

Leave a Reply