SQL Subqueries & Temporary Tables
Subquery
- a query within a query
- a query has both SELECT and FROM clauses to signify what you want to extract from a table and what table you’d like to pull data from. A query that includes subquery, as a result, has multiple SELECT and FROM clauses.
- The subquery that sits nested inside a larger query is called an INNER QUERY. This inner query can be fully executed on its own and often is run independently before when trying to troubleshoot bugs in your code.
SELECT product_id,
name,
price
FROM db.product
Where price > (SELECT AVG(price)
FROM db.product)
- Typically, you need to use a subquery when you have the need to manipulate an existing table to “pseudo-create” a table that is then used as a part of a larger query.
When do you need to use a subquery?
You need to use a subquery when you have the need to manipulate an existing table to “pseudo-create” a table that is then used as a part of a larger query. In the examples below, existing tables cannot be joined together to solve the problem at hand. Instead, an existing table needs to be manipulated, massaged, or aggregated in some way to then join to another table in the dataset to answer the posed question.
Set of Problems:
- Identify the top-selling Amazon products in months where sales have exceeded $1m
- Existing Table: Amazon daily sales
- Subquery Aggregation: Daily to Monthly
- Examine the average price of a brand’s products for the highest-grossing brands
- Existing Table: Product pricing data across all retailers
- Subquery Aggregation: Individual to Average
- Order the annual salary of employees that are working less than 150 hours a month
- Existing Table: Daily time-table of employees
- Subquery Aggregation: Daily to Monthly
SELECT product_id,
name,
price
FROM db.product
Where price > (SELECT AVG(price)
FROM db.product)
Subqueries vs JOINs
Often times, a problem can be solved using either a subquery or a join. The user considers the tradeoffs and determines the path forward.
Differences between Subqueries and Joins
Use Cases:
Subquery: When an existing table needs to be manipulated or aggregated to then be joined to a larger table.
Joins: A fully flexible and discretionary use case where a user wants to bring two or more tables together and select and filter as needed.
Syntax:
Subquery: A subquery is a query within a query. The syntax, as a result, has multiple SELECT and FROM clauses.
Joins: A join is simple stitching together multiple tables with a common key or column. A join clause cannot stand and be run independently.
Dependencies:
Subquery: A subquery clause can be run completely independently. When trying to debug code, subqueries are often run independently to pressure test results before running the larger query.
Joins: A join clause cannot stand and be run independently.
Similarities between Subqueries and Joins
Output:
Both subqueries and joins are essentially bringing multiple tables together (whether an existing table is first manipulated or not) to generate a single output.
Deep-dive topics:
What happens under the hood: Query plans are similar for both subqueries and joins. You can read more about how query plans are here.
Subquery vs Joins Overview
Components | Subquery | JOINS |
---|---|---|
Combine data from multiple tables into a single result | X | X |
Create a flexible view of tables stitched together using a “key” | X | |
Build an output to use in a later part of the query | X | |
Subquery Plan: What happens under the hood | X | X |
Query 1
SELECT product_id,
name,
price
FROM db.product
Where price > (SELECT AVG(price)
FROM db.product)
Query 2
SELECT a.brand_id,
a.total_brand_sales
AVG(b.product_price)
FROM brand_table a
JOIN brand_table b
ON b.brand_id = a.brand_id
GROUP BY a.brand_id, a.total_brand_sales
ORDER BY a.total_brand_sales desc;
Subqueries and Joins Deep-dives
Subqueries:
Output: Either a scalar (a single value) or rows that have met a condition. Use Case: Calculate a scalar value to use in a later part of the query (e.g., average price as a filter). Dependencies: Stand independently and be run as complete queries themselves.
Joins:
Output: A joint view of multiple tables stitched together using a common “key”. Use Case: Fully stitch tables together and have full flexibility on what to “select” and “filter from”. Dependencies: Cannot stand independently.
Subquery Basics
- must be fully placed inside parenthesis
- must be fully independent and can be executed on its own
- have two components to consider:
- where its placed
- dependencies
A caveat with subqueries being independent:
In almost all cases, subqueries are fully independent. They are “interim”/temp tables that can be fully executed on their own. However, there is an exception. When a subquery, typically in the form of a nested or inline subquery, is correlated to its outer query, it cannot run independently. This is most certainly an edge case since correlated subqueries are rarely implemented compared to standalone, simple subqueries.
Placement:
There are four places where subqueries can be inserted within a larger query:
- With
- Nested
- Inline
- Scalar
Dependencies:
A subquery can be dependent on the outer query or independent of the outer query.
Resources:
One of my favorite resources on subqueries that covers use cases, syntax, and examples is from Microsoft and can be found here.
Subqueries: Placement
The problem you are trying to solve and the readability of the query determine where the subquery is placed.
The key concept of placement is where exactly the subquery is placed within the context of the larger query. There are four different places where a subquery can be inserted. From my experience, the decision of which placement to leverage stems from (1) the problem at hand and (2) the readability of the query.
Subquery Placement:
With: This subquery is used when you’d like to “pseudo-create” a table from an existing table and visually scope the temporary table at the top of the larger query.
Nested: This subquery is used when you’d like the temporary table to act as a filter within the larger query, which implies that it often sits within the where clause.
Inline: This subquery is used in the same fashion as the WITH use case above. However, instead of the temporary table sitting on top of the larger query, it’s embedded within the from clause.
Scalar: This subquery is used when you’d like to generate a scalar value to be used as a benchmark of some sort.
For example, when you’d like to calculate the average salary across an entire organization to compare to individual employee salaries. Because it’s often a single value that is generated and used as a benchmark, the scalar subquery often sits within the select clause.
Advantages:
Readability: With
and Nested
subqueries are most advantageous for readability.
Performance: Scalar
subqueries are advantageous for performance and are often used on smaller datasets.
SELECT channel,
AVG(event_count) AS avg_event_count
FROM
(SELECT DATE_TRUNC('day',occurred_at) AS day,
channel,
count(*) as event_count
FROM web_events
GROUP BY 1,2
) sub
GROUP BY 1
ORDER BY 2 DESC
- Build the Subquery: The aggregation of an existing table that you’d like to leverage as a part of the larger query.
- Run the Subquery: Because a subquery can stand independently, it’s important to run its content first to get a sense of whether this aggregation is the interim output you are expecting.
- Encapsulate and Name: Close this subquery off with parentheses and call it something. In this case, we called the subquery table ‘sub.’
- Test Again: Run a
SELECT *
within the larger query to determine if all syntax of the subquery is good to go. - Build Outer Query: Develop the
SELECT *
clause as you see fit to solve the problem at hand, leveraging the subquery appropriately.
Quiz
- Find the number of events that occur for each day for each channel.
SELECT COUNT(*) AS event_count,
DATE_TRUNC('day', occurred_at),
channel
FROM web_events
GROUP BY DATE_TRUNC('day', occurred_at),
channel
ORDER BY COUNT(*) DESC;
Answer Key
SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2
ORDER BY 3 DESC;
I need to learn how to use 1, 2, and so on, this represents the number of columns as set on SELECT query.
- On which day-channel pair did the most events occur?
event_count | date_trunc | channel |
---|---|---|
21 | 2017-01-01T00:00:00.000Z | direct |
21 | 2016-12-21T00:00:00.000Z | direct |
- Now create a subquery that simply provides all of the data from your first query.
SELECT channel, Day, event_count
FROM (SELECT COUNT(*) AS event_count,
DATE_TRUNC('day', occurred_at) AS Day,
channel
FROM web_events
GROUP BY DATE_TRUNC('day', occurred_at),
channel) sub
ORDER BY 3 DESC;
Answer Key:
SELECT *
FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2
ORDER BY 3 DESC) sub;
I need to learn how to simplify my code.
- The original query goes in the FROM statement. An * is used in the SELECT statement to pull all of the data from the original query. You must use an alias for the table you nest within the outer query.
- Now find the average number of events for each channel. Since you broke out by a day earlier, this is giving you an average per day.
SELECT channel, AVG(event_count) AS avg_event_count
FROM (SELECT COUNT(*) AS event_count,
DATE_TRUNC('day', occurred_at) AS Day,
channel
FROM web_events
GROUP BY DATE_TRUNC('day', occurred_at),
channel) sub
GROUP BY 1;
channel | avg_event_count |
---|---|
adwords | 1.5701906412478336 |
organic | 1.6672504378283713 |
1.3166666666666667 | |
direct | 4.8964879852125693 |
1.5983471074380165 | |
banner | 1.2899728997289973 |
Answer Key:
SELECT channel, AVG(events) AS average_events
FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2) sub
GROUP BY channel
ORDER BY 2 DESC;
I get the same results, but what happened here?
SELECT – we got similar queries here.
FROM – we’ve got similar queries here but in different order.
I need to learn/practice simplifying my GROUP BY query with column numbers.
Placement WITH
- statement “scoped” technique to improve the structure of the subquery
- When a user wants to create a version of an existing table to be used in a larger query (e.g., aggregate daily prices to an average price table).
- It is advantageous for readability purposes.
CTE – stands for Common Table Expression. A Common Table Expression in SQL allows you to define a temporary result, such as a table, to then be referenced in a later part of the query.
QUESTION: You need to find the average number of events for each channel per day.
SOLUTION:
SELECT channel, AVG(events) AS average_events
FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2) sub
GROUP BY channel
ORDER BY 2 DESC;
Let’s try this again using a WITH statement.
Notice, you can pull the inner query:
SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2
This is the part we put in the WITH statement. Notice, we are aliasing the table as events
below:
WITH events AS (
SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2)
Now, we can use this newly created events
table as if it is any other table in our database:
WITH events AS (
SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2)
SELECT channel, AVG(events) AS average_events
FROM events
GROUP BY channel
ORDER BY 2 DESC;
For the above example, we don’t need anymore than the one additional table, but imagine we needed to create a second table to pull from. We can create an additional table to pull from in the following way:
WITH table1 AS (
SELECT *
FROM web_events),
table2 AS (
SELECT *
FROM accounts)
SELECT *
FROM table1
JOIN table2
ON table1.account_id = table2.id;
You can add more and more tables using the WITH statement in the same way.
Quiz
- Provide the name of the sales_rep in each region with the largest amount of total_amt_usd sales.
- a table of sales rep per region with total_amt_usd each
- a table that calculates the MAX total_amt_usd per region
- a final table that filters the sales rep, the region and the highest total amt usd
I am correct on my approach here, I only messed up with the code.
Attempt 1*
WITH Region AS (SELECT r.name Region, s.name SalesRep
FROM region r
JOIN sales_reps s
ON r.id = s.region_id),
AverageSales AS (SELECT o.total_amt_usd, s.name SalesRep
FROM sales_reps s
JOIN accounts a
ON s.id = a.sales_rep_id
JOIN orders o
ON a.id = o.account_id)
SELECT Region, s.name, AverageSales
FROM Region, AverageSales, sales_reps s
GROUP BY Region, s.name, averagesales;
Attempt 2
- a table of sales rep per region with total_amt_usd each
SELECT r.name Region, s.name SalesRep, SUM(o.total_amt_usd)
FROM region r
JOIN sales_reps s
ON r.id = s.region_id
JOIN accounts a
ON s.id = a.sales_rep_id
JOIN orders o
ON a.id = o.account_id
GROUP BY r.name, s.name
- a table that calculates the MAX total_amt_usd per region
I AM LOST!!!
Answer Key:
WITH t1 AS (
SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY 1,2
ORDER BY 3 DESC),
t2 AS (
SELECT region_name, MAX(total_amt) total_amt
FROM t1
GROUP BY 1)
SELECT t1.rep_name, t1.region_name, t1.total_amt
FROM t1
JOIN t2
ON t1.region_name = t2.region_name AND t1.total_amt = t2.total_amt;
What happened here???!!!
Watched tutorials online:
I will try to explain what happened there, based on my understanding.
Step 1 – It takes the total of sales per sales rep name per region. No multiple Sales Rep name, but with multiple region.
SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY 1,2
ORDER BY 3 DESC
STEP 2 – Then it adds to a temporary table with the query WITH, receiving a name t1.
WITH t1 AS (
SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY 1,2
ORDER BY 3 DESC)
Step 3 – It calculates the MAX total sales from t1 and grouped it by region name.
WITH t1 AS (
SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY 1,2
ORDER BY 3 DESC)
SELECT region_name, MAX(total_amt) total_amt
FROM t1
GROUP BY 1
region_name | total_amt |
---|---|
Midwest | 675637.19 |
Southeast | 1098137.72 |
Northeast | 1010690.60 |
West | 886244.12 |
Step 4 – Created a second temporary table for the MAX value of total sales, with a name t2.
WITH t1 AS (
SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY 1,2
ORDER BY 3 DESC),
t2 AS (
SELECT region_name, MAX(total_amt) total_amt
FROM t1
GROUP BY 1)
Step 5 – Combined t1 and t2 in the final table.
See how if only t1
WITH t1 AS (
SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY 1,2
ORDER BY 3 DESC),
t2 AS (
SELECT region_name, MAX(total_amt) total_amt
FROM t1
GROUP BY 1)
SELECT t1.rep_name, t1.region_name, t1.total_amt
FROM t1
rep_name | region_name | total_amt |
---|---|---|
Earlie Schleusner | Southeast | 1098137.72 |
Tia Amato | Northeast | 1010690.60 |
Vernita Plump | Southeast | 934212.93 |
Georgianna Chisholm | West | 886244.12 |
Arica Stoltzfus | West | 810353.34 |
Dorotha Seawell | Southeast | 766935.04 |
Nelle Meaux | Southeast | 749076.16 |
then joined t2, assigning t1.region_name, t1.total_amt column based on data from t2
WITH t1 AS (
SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY 1,2
ORDER BY 3 DESC),
t2 AS (
SELECT region_name, MAX(total_amt) total_amt
FROM t1
GROUP BY 1)
SELECT t1.rep_name, t1.region_name, t1.total_amt
FROM t1
JOIN t2
ON t1.region_name = t2.region_name AND t1.total_amt = t2.total_amt;
rep_name | region_name | total_amt |
---|---|---|
Earlie Schleusner | Southeast | 1098137.72 |
Tia Amato | Northeast | 1010690.60 |
Georgianna Chisholm | West | 886244.12 |
Charles Bidwell | Midwest | 675637.19 |
But I think I have understood this better, without the
t1.region_name = t2.region_name AND
This query doesn’t seem to be necessary, as we’re only filtering out the Max for each region based on the data from t2.
WITH t1 AS (
SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
FROM sales_reps s
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
JOIN region r
ON r.id = s.region_id
GROUP BY 1,2
ORDER BY 3 DESC),
t2 AS (
SELECT region_name, MAX(total_amt) total_amt
FROM t1
GROUP BY 1)
SELECT t1.rep_name, t1.region_name, t1.total_amt
FROM t1
JOIN t2
ON t1.total_amt = t2.total_amt;
rep_name | region_name | total_amt |
---|---|---|
Earlie Schleusner | Southeast | 1098137.72 |
Tia Amato | Northeast | 1010690.60 |
Georgianna Chisholm | West | 886244.12 |
Charles Bidwell | Midwest | 675637.19 |
I think I was confused with the JOIN query used here, as I have learned it with the keys. Keys are not existing for temporary tables in CTE, so it was assigned freely with CTEs?
Let me try the next question…
- For the region with the largest sales total_amt_usd, how many total orders were placed?
- I am looking for the MAX amount of sale per region.
- And for that region with MAX amount of sale in USD all across, how many orders were placed?
I finally get it!!! 😊
WITH t1 AS (SELECT r.name Region, SUM(o.total_amt_usd) total_usd, COUNT(o.total) total_order
FROM region r
JOIN sales_reps s
ON r.id = s.region_id
JOIN accounts a
ON s.id = a.sales_rep_id
JOIN orders o
ON a.id = o.account_id
GROUP BY 1),
t2 AS (SELECT MAX(t1.total_usd) total_usd
FROM t1)
SELECT t1.Region, t1.total_order, t1.total_usd
FROM t1
JOIN t2
ON t1.total_usd = t2.total_usd;
region | total_order | total_usd |
---|---|---|
Northeast | 2357 | 7744405.36 |
There seems to be something wrong with the Solutions – reported.
- How many accounts had more total purchases than the account name which has bought the most standard_qty paper throughout their lifetime as a customer?
Step 1 – get all the data for the account name, count of total orders, and count of standard_qty
SELECT a.name AcctName, COUNT(o.total) Total, COUNT(standard_qty) Standard_Total
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY a.name;
acctname | total | standard_total |
---|---|---|
Boeing | 15 | 15 |
Western Digital | 65 | 65 |
Sysco | 68 | 68 |
Southern | 20 | 20 |
Altria Group | 51 | 51 |
Energy Transfer Equity | 6 | 6 |
Aramark | 8 | 8 |
Step 2 Get MAX of total standard qty orders
WITH t1 AS (SELECT a.name AcctName, COUNT(o.total) Total, COUNT(standard_qty) Standard_Total
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY a.name)
SELECT MAX(t1.Standard_Total) StandardTotal
FROM t1
standardtotal |
---|
71 |
This is my final answer:
WITH t1 AS (SELECT a.name AcctName, COUNT(o.total) Total, COUNT(standard_qty) Standard_Total
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY a.name),
t2 AS (SELECT MAX(t1.Standard_Total) StandardTotal
FROM t1)
SELECT t1.AcctName
FROM t1, t2
WHERE t1.Total > t2.StandardTotal
GROUP BY t1.AcctName
This returns 0.
So I checked the data:
SELECT a.name AcctName, COUNT(o.total) Total, COUNT(standard_qty) Standard_Total
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY a.name
ORDER BY 2 DESC;
acctname | total | standard_total |
---|---|---|
Leucadia National | 71 | 71 |
Sysco | 68 | 68 |
Supervalu | 68 | 68 |
Arrow Electronics | 67 | 67 |
General Dynamics | 66 | 66 |
Mosaic | 66 | 66 |
Archer Daniels Midland | 66 | 66 |
Western Digital | 65 | 65 |
United States Steel | 65 | 65 |
Philip Morris International | 65 | 65 |
Fluor | 65 | 65 |
United Continental Holdings | 63 | 63 |
Marathon Petroleum | 62 | 62 |
Back to the question – How many accounts had more total purchases than the account name which has bought the most standard_qty paper throughout their lifetime as a customer?
The max standard_qty paper purchased is 71 by Leucadia National, while the max number of order placed by an account is also 71 and with the same account.
Did I do something wrong?
Let me break this down.
I checked the total order placed by the accounts, and I get the same result – 71.
SELECT a.name AcctName, COUNT(o.total) Total
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY a.name
ORDER BY 2 DESC;
acctname | total |
---|---|
Leucadia National | 71 |
Sysco | 68 |
Supervalu | 68 |
Arrow Electronics | 67 |
General Dynamics | 66 |
Mosaic | 66 |
Archer Daniels Midland | 66 |
I also checked the total standard qty per account, and the max is still 71 by the same company.
SELECT a.name AcctName, COUNT(standard_qty) Standard_Total
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY a.name
ORDER BY 2 DESC;
acctname | standard_total |
---|---|
Leucadia National | 71 |
Sysco | 68 |
Supervalu | 68 |
Arrow Electronics | 67 |
General Dynamics | 66 |
I am checking the Solutions –
WITH t1 AS (
SELECT a.name account_name, SUM(o.standard_qty) total_std, SUM(o.total) total
FROM accounts a
JOIN orders o
ON o.account_id = a.id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 1),
t2 AS (
SELECT a.name
FROM orders o
JOIN accounts a
ON a.id = o.account_id
GROUP BY 1
HAVING SUM(o.total) > (SELECT total FROM t1))
SELECT COUNT(*)
FROM t2;
count |
---|
3 |
I am almost correct!! Except I should have used SUM instead of COUNT. 😞
BUT.. I checked again…
WITH t1 AS (SELECT a.name AcctName, SUM(o.total) Total, SUM(standard_qty) Standard_Total
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY a.name),
t2 AS (SELECT MAX(t1.Standard_Total) StandardTotal
FROM t1)
SELECT t1.AcctName, t2.StandardTotal Max_Standard, t1.Total Total_Orders
FROM t1, t2
WHERE t1.Total > t2.StandardTotal
GROUP BY t1.AcctName, t2.StandardTotal, t1.Total
acctname | standardtotal | total |
---|---|---|
Leucadia National | 41617 | 42358 |
Core-Mark Holding | 41617 | 44750 |
EOG Resources | 41617 | 56410 |
IBM | 41617 | 47506 |
General Dynamics | 41617 | 43730 |
Mosaic | 41617 | 49246 |
The MAX standard qty is 41617 and there are 6 companies that placed more orders in total than that.
- For the customer that spent the most (in total over their lifetime as a customer) total_amt_usd, how many web_events did they have for each channel?
STEP 1 – Get all the data across all columns needed
SELECT a.name, w.channel, SUM(o.total_amt_usd), COUNT(w.id)
FROM accounts a
JOIN web_events w
ON a.id = w.account_id
JOIN orders o
on a.id = o.account_id
GROUP BY 1, 2
ORDER BY 3 DESC;
name | channel | sum | count |
---|---|---|---|
EOG Resources | direct | 16846425.20 | 2728 |
Mosaic | direct | 16244073.73 | 3102 |
Leucadia National | direct | 15134457.00 | 3692 |
IBM | direct | 13726418.16 | 2520 |
General Dynamics | direct | 13230570.76 | 2904 |
From here, I know that EOG Resources is the biggest spender among all accounts.
Step 2 – Get the MAX spent and put in the second table
WITH t1 AS (SELECT a.name AcctName, w.channel Channel, SUM(o.total_amt_usd) Total_Spent, COUNT(w.id) Channel_Count
FROM accounts a
JOIN web_events w
ON a.id = w.account_id
JOIN orders o
on a.id = o.account_id
GROUP BY 1, 2),
t2 AS (SELECT MAX(t1.total_spent) Total_Spent
FROM t1)
SELECT t1.AcctName, t1.Channel_Count, t1.Channel
FROM t1
JOIN t2
ON t1.Total_Spent = t2.Total_Spent
Again!!
Step 1 – Get the max spender
SELECT a.name AcctName, SUM(o.total_amt_usd) Total_Spent
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY 1
ORDER BY 2 DESC;
acctname | total_spent |
---|---|
EOG Resources | 382873.30 |
Mosaic | 345618.59 |
IBM | 326819.48 |
General Dynamics | 300694.79 |
Republic Services | 293861.14 |
Leucadia National | 291047.25 |
Arrow Electronics | 281018.36 |
I get the same account, but different amount 382873.30 WHY???!!
Step 2 – Get the max amount
WITH t1 AS (SELECT a.name AcctName, SUM(o.total_amt_usd) Total_Spent
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY 1
ORDER BY 2 DESC)
SELECT MAX(t1.Total_Spent)
FROM t1
max |
---|
382873.30 |
Same information with the MAX query
STEP 3 – Get the channel counts for each account
WITH t1 AS (SELECT a.name AcctName, SUM(o.total_amt_usd) Total_Spent
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY 1),
t2 AS (SELECT MAX(t1.Total_Spent) Total_Spent
FROM t1)
SELECT t1.AcctName AcctName, w.channel Channel, COUNT(w.channel) Channel_Count
FROM t1, web_events w
GROUP BY t1.AcctName, w.channel
acctname | channel | channel_count |
---|---|---|
Oneok | adwords | 906 |
Whole Foods Market | organic | 952 |
TIAA | adwords | 906 |
Whirlpool | adwords | 906 |
Time Warner Cable | adwords | 906 |
Gilead Sciences | 474 | |
Kohl’s | banner | 476 |
I finally got it!!!!!
WITH t1 AS (SELECT a.name AcctName, SUM(o.total_amt_usd) Total_Spent
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY 1),
t2 AS (SELECT MAX(t1.Total_Spent) Total_Spent
FROM t1),
t3 AS (SELECT t1.AcctName AcctName, w.channel Channel, COUNT(w.channel) Channel_Count
FROM t1, web_events w
GROUP BY t1.AcctName, w.channel)
SELECT t1.AcctName, t3.Channel, t3.Channel_Count
FROM t1, t2, t3
WHERE t1.Total_Spent = t2.Total_Spent
GROUP BY 1, 2, 3
acctname | channel | channel_count |
---|---|---|
EOG Resources | 967 | |
EOG Resources | 474 | |
EOG Resources | banner | 476 |
EOG Resources | organic | 952 |
EOG Resources | adwords | 906 |
EOG Resources | direct | 5298 |
- What is the lifetime average amount spent in terms of total_amt_usd for the top 10 total spending accounts?
STEP 1 – Get the top 10 spenders
SELECT a.name, SUM(o.total_amt_usd)
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10;
name | sum |
---|---|
EOG Resources | 382873.30 |
Mosaic | 345618.59 |
IBM | 326819.48 |
General Dynamics | 300694.79 |
Republic Services | 293861.14 |
Leucadia National | 291047.25 |
Arrow Electronics | 281018.36 |
Sysco | 278575.64 |
Supervalu | 275288.30 |
Archer Daniels Midland | 272672.84 |
WITH t1 AS (SELECT a.name AcctName, SUM(o.total_amt_usd) Total_Spent
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10)
SELECT AVG(t1.Total_Spent) TotalSpent
FROM t1
avg_totalspent |
---|
304846.969000000000 |
I GOT THIS RIGHT!!!
- What is the lifetime average amount spent in terms of total_amt_usd, including only the companies that spent more per order, on average, than the average of all orders.
- get average spent per company
- get average spent in total – 3348.0196513310185185
- compare a and b
- filter those who spends more in average than the average in total
- get their average
WITH t1 AS (SELECT AVG(o.total_amt_usd) Over_All_Total
FROM accounts a
JOIN orders o
ON a.id = o.account_id),
t2 AS (SELECT a.name, o.total_amt_usd Total_USD
FROM accounts a
JOIN orders o
ON a.id = o.account_id
GROUP BY 1, 2
HAVING AVG(o.total_amt_usd) > (SELECT Over_All_Total FROM t1)
ORDER BY 2 DESC)
SELECT AVG(t2.total_usd)
FROM t2
avg |
---|
7917.6384593190998269 |
Answer Key:
WITH t1 AS (
SELECT AVG(o.total_amt_usd) avg_all
FROM orders o
JOIN accounts a
ON a.id = o.account_id),
t2 AS (
SELECT o.account_id, AVG(o.total_amt_usd) avg_amt
FROM orders o
GROUP BY 1
HAVING AVG(o.total_amt_usd) > (SELECT * FROM t1))
SELECT AVG(avg_amt)
FROM t2;
This also doesn’t match. 😞 I will check this again later, but I will have to move on for now.
Leave a Reply