Challenge Problems
Tables
categories — Categories (parent-child hierarchy)
customers — Customers (grade, points, channel)
orders — Orders (status, amount, date)
order_items — Order items (qty, unit price)
products — Products (name, price, stock, brand)
reviews — Reviews (rating, content)
shipping — Shipping (carrier, tracking, status)
point_transactions — Points (earn, use, expire)
product_views — View log (customer, product, date)
inventory_transactions — Inventory (type, quantity)
wishlists — Wishlists (customer-product)
calendar — Calendar (weekday, holiday)
Concepts
Window, Analytics, CTE, consecutive, median, retention
1. Duplicate Review Detection
Find customers who reviewed the same product more than once. Show customer name, product name, and review count.
Hint 1: - JOIN reviews with customers and products
- GROUP BY customer_id, product_id then HAVING COUNT(*) >= 2
Answer
SELECT
c.name AS customer_name,
p.name AS product_name,
COUNT(*) AS review_count
FROM reviews AS r
INNER JOIN customers AS c ON r.customer_id = c.id
INNER JOIN products AS p ON r.product_id = p.id
GROUP BY r.customer_id, r.product_id, c.name, p.name
HAVING COUNT(*) >= 2
ORDER BY review_count DESC;
Result (top 7 of 321 rows)
| customer_name | product_name | review_count |
|---|---|---|
| Gabriel Walters | G.SKILL Trident Z5 DDR5 64GB 6000MHz ... | 5 |
| Jason Rivera | SteelSeries Aerox 5 Wireless Silver | 4 |
| Gabriel Walters | Samsung DDR5 32GB PC5-38400 | 4 |
| Gabriel Walters | Samsung Odyssey G7 32 White | 4 |
| Brenda Garcia | ASUS PCE-BE92BT | 4 |
| Allen Snyder | Samsung DDR4 32GB PC4-25600 | 4 |
| Allen Snyder | Keychron Q1 Pro Silver | 4 |
2. Weekday vs Weekend Average Order Amount
Using the calendar table, compare average order amounts on weekdays vs weekends. Show weekday/weekend label, order count, average order value, and total revenue.
Hint 1: - Use calendar.is_weekend column
- JOIN orders with calendar on date
Answer
SELECT
CASE cal.is_weekend
WHEN 1 THEN 'Weekend'
ELSE 'Weekday'
END AS day_type,
COUNT(*) AS order_count,
ROUND(AVG(o.total_amount), 2) AS avg_order_value,
ROUND(SUM(o.total_amount), 2) AS total_revenue
FROM orders AS o
INNER JOIN calendar AS cal
ON SUBSTR(o.ordered_at, 1, 10) = cal.date_key
WHERE o.status NOT IN ('cancelled', 'returned', 'return_requested')
GROUP BY cal.is_weekend
ORDER BY cal.is_weekend;
Result (2 rows)
| day_type | order_count | avg_order_value | total_revenue |
|---|---|---|---|
| Weekday | 23,745 | 1,003,330.80 | 23,824,089,843.00 |
| Weekend | 10,953 | 999,300.48 | 10,945,338,164.00 |
3. Daily Order Count vs Previous Day
Calculate daily order counts for December 2024. Use LAG to show only days where the count increased vs the previous day.
Hint 1: - Extract date with SUBSTR(ordered_at, 1, 10)
- Use LAG(order_count) OVER (ORDER BY order_date) for previous day
Answer
WITH daily AS (
SELECT
SUBSTR(ordered_at, 1, 10) AS order_date,
COUNT(*) AS order_count
FROM orders
WHERE ordered_at LIKE '2024-12%'
AND status NOT IN ('cancelled')
GROUP BY SUBSTR(ordered_at, 1, 10)
),
with_prev AS (
SELECT
order_date,
order_count,
LAG(order_count) OVER (ORDER BY order_date) AS prev_count
FROM daily
)
SELECT
order_date,
order_count,
prev_count,
order_count - prev_count AS diff
FROM with_prev
WHERE order_count > prev_count
ORDER BY order_date;
Result (top 7 of 11 rows)
| order_date | order_count | prev_count | diff |
|---|---|---|---|
| 2024-12-03 | 15 | 14 | 1 |
| 2024-12-06 | 14 | 9 | 5 |
| 2024-12-07 | 15 | 14 | 1 |
| 2024-12-08 | 18 | 15 | 3 |
| 2024-12-14 | 17 | 13 | 4 |
| 2024-12-17 | 17 | 14 | 3 |
| 2024-12-20 | 18 | 14 | 4 |
4. 3rd Most Expensive Product per Category
Find the 3rd most expensive product in each category. Show category name, product name, price, and rank.
Hint 1: - ROW_NUMBER() OVER (PARTITION BY category_id ORDER BY price DESC)
- Filter WHERE rn = 3
Answer
WITH ranked AS (
SELECT
cat.name AS category,
p.name AS product_name,
p.price,
ROW_NUMBER() OVER (
PARTITION BY p.category_id
ORDER BY p.price DESC
) AS rn
FROM products AS p
INNER JOIN categories AS cat ON p.category_id = cat.id
WHERE p.is_active = 1
)
SELECT category, product_name, price, rn AS rank
FROM ranked
WHERE rn = 3
ORDER BY price DESC;
Result (top 7 of 32 rows)
| category | product_name | price | rank |
|---|---|---|---|
| Gaming Laptop | Razer Blade 18 Black | 2,987,500.00 | 3 |
| General Laptop | ASUS ExpertBook B5 White | 2,068,800.00 | 3 |
| Custom Build | ASUS ROG Strix G16CH Silver | 1,879,100.00 | 3 |
| NVIDIA | MSI GeForce RTX 4070 Ti Super GAMING X | 1,744,000.00 | 3 |
| Professional Monitor | LG 32EP950 OLED White | 1,545,700.00 | 3 |
| Gaming Monitor | LG UltraGear 27GR95QE White | 1,511,700.00 | 3 |
| 2-in-1 | HP Pavilion x360 14 Black | 1,479,700.00 | 3 |
5. A/B Bucket Split
Split customers into A/B groups using MOD(id, 2). Compare customer count, average order value, and average order count per group.
Hint 1: - CASE WHEN c.id % 2 = 0 THEN 'A' ELSE 'B' END
- JOIN customers with orders, aggregate by group
Answer
SELECT
CASE WHEN c.id % 2 = 0 THEN 'A' ELSE 'B' END AS bucket,
COUNT(DISTINCT c.id) AS customer_count,
COUNT(o.id) AS total_orders,
ROUND(AVG(o.total_amount), 2) AS avg_order_value,
ROUND(1.0 * COUNT(o.id) / COUNT(DISTINCT c.id), 1) AS avg_orders_per_customer
FROM customers AS c
LEFT JOIN orders AS o
ON c.id = o.customer_id
AND o.status NOT IN ('cancelled', 'returned', 'return_requested')
GROUP BY CASE WHEN c.id % 2 = 0 THEN 'A' ELSE 'B' END
ORDER BY bucket;
Result (2 rows)
| bucket | customer_count | total_orders | avg_order_value | avg_orders_per_customer |
|---|---|---|---|---|
| A | 2615 | 17,147 | 1,010,823.23 | 6.60 |
| B | 2615 | 17,551 | 993,495.65 | 6.70 |
6. Tree Node Type Classification
Classify each category as root / inner / leaf. Root: parent_id IS NULL. Inner: non-root with children. Leaf: no children.
Hint 1: - LEFT JOIN categories AS child ON cat.id = child.parent_id
- Use CASE based on parent_id and child existence
Answer
SELECT
cat.id,
cat.name,
cat.parent_id,
cat.depth,
CASE
WHEN cat.parent_id IS NULL THEN 'root'
WHEN EXISTS (SELECT 1 FROM categories c2 WHERE c2.parent_id = cat.id) THEN 'inner'
ELSE 'leaf'
END AS node_type
FROM categories AS cat
ORDER BY cat.depth, cat.sort_order;
Result (top 7 of 53 rows)
| id | name | parent_id | depth | node_type |
|---|---|---|---|---|
| 1 | Desktop PC | NULL | 0 | root |
| 5 | Laptop | NULL | 0 | root |
| 10 | Monitor | NULL | 0 | root |
| 14 | CPU | NULL | 0 | root |
| 17 | Motherboard | NULL | 0 | root |
| 20 | Memory (RAM) | NULL | 0 | root |
| 23 | Storage | NULL | 0 | root |
7. Daily Order Cancellation Rate (Last 30 Days)
For the last 30 days (based on 2025-12-01 to 2025-12-31), calculate daily total orders, cancelled orders, and cancellation rate.
Hint 1: - Calculate ratio of status = 'cancelled' orders
- SUM(CASE WHEN status = 'cancelled' THEN 1 ELSE 0 END)
Answer
SELECT
SUBSTR(ordered_at, 1, 10) AS order_date,
COUNT(*) AS total_orders,
SUM(CASE WHEN status = 'cancelled' THEN 1 ELSE 0 END) AS cancelled_orders,
ROUND(100.0 * SUM(CASE WHEN status = 'cancelled' THEN 1 ELSE 0 END)
/ COUNT(*), 2) AS cancel_rate_pct
FROM orders
WHERE ordered_at BETWEEN '2025-12-01' AND '2025-12-31 23:59:59'
GROUP BY SUBSTR(ordered_at, 1, 10)
ORDER BY order_date;
Result (top 7 of 31 rows)
| order_date | total_orders | cancelled_orders | cancel_rate_pct |
|---|---|---|---|
| 2025-12-01 | 23 | 2 | 8.70 |
| 2025-12-02 | 20 | 1 | 5.00 |
| 2025-12-03 | 21 | 0 | 0.0 |
| 2025-12-04 | 19 | 1 | 5.26 |
| 2025-12-05 | 24 | 1 | 4.17 |
| 2025-12-06 | 22 | 0 | 0.0 |
| 2025-12-07 | 20 | 0 | 0.0 |
8. Installment Amount Calculation
For orders over 500,000 KRW, calculate monthly installment amounts for 3, 6, and 12 months. Show order number, total, and monthly amounts.
Hint 1: - Simple division: ROUND(total_amount / 3, 0) etc.
- WHERE total_amount >= 500000
Answer
SELECT
order_number,
total_amount,
ROUND(total_amount / 3, 0) AS monthly_3m,
ROUND(total_amount / 6, 0) AS monthly_6m,
ROUND(total_amount / 12, 0) AS monthly_12m
FROM orders
WHERE total_amount >= 500000
AND status NOT IN ('cancelled', 'returned', 'return_requested')
ORDER BY total_amount DESC
LIMIT 20;
Result (top 7 of 20 rows)
| order_number | total_amount | monthly_3m | monthly_6m | monthly_12m |
|---|---|---|---|---|
| ORD-20201121-08810 | 50,867,500.00 | 16,955,833.00 | 8,477,917.00 | 4,238,958.00 |
| ORD-20250305-32265 | 46,820,024.00 | 15,606,675.00 | 7,803,337.00 | 3,901,669.00 |
| ORD-20200209-05404 | 43,677,500.00 | 14,559,167.00 | 7,279,583.00 | 3,639,792.00 |
| ORD-20251218-37240 | 38,626,400.00 | 12,875,467.00 | 6,437,733.00 | 3,218,867.00 |
| ORD-20220106-15263 | 37,987,600.00 | 12,662,533.00 | 6,331,267.00 | 3,165,633.00 |
| ORD-20200820-07684 | 37,518,200.00 | 12,506,067.00 | 6,253,033.00 | 3,126,517.00 |
| ORD-20220224-15869 | 35,397,700.00 | 11,799,233.00 | 5,899,617.00 | 2,949,808.00 |
9. Estimate NULL Birth Dates
For customers with NULL birth_date, estimate their birth year using the average birth year of customers in the same grade. Show name, grade, original birth_date, and estimated birth_year.
Hint 1: - AVG(CAST(SUBSTR(birth_date, 1, 4) AS INTEGER)) for average birth year by grade
- Apply to NULL customers via LEFT JOIN or scalar subquery
Answer
WITH grade_avg_year AS (
SELECT
grade,
ROUND(AVG(CAST(SUBSTR(birth_date, 1, 4) AS INTEGER)), 0) AS avg_birth_year
FROM customers
WHERE birth_date IS NOT NULL
GROUP BY grade
)
SELECT
c.id,
c.name,
c.grade,
c.birth_date,
gay.avg_birth_year AS estimated_birth_year
FROM customers AS c
INNER JOIN grade_avg_year AS gay ON c.grade = gay.grade
WHERE c.birth_date IS NULL
ORDER BY c.grade, c.id
LIMIT 20;
Result (top 7 of 20 rows)
| id | name | grade | birth_date | estimated_birth_year |
|---|---|---|---|---|
| 7 | Ashley Jones | BRONZE | NULL | 1,988.00 |
| 13 | Andrew Reeves | BRONZE | NULL | 1,988.00 |
| 14 | Martha Murphy | BRONZE | NULL | 1,988.00 |
| 24 | Barbara White | BRONZE | NULL | 1,988.00 |
| 36 | Angela Barrera | BRONZE | NULL | 1,988.00 |
| 38 | Dana Miles | BRONZE | NULL | 1,988.00 |
| 42 | Emma Nguyen | BRONZE | NULL | 1,988.00 |
10. Find Duplicate Wishlists
Although wishlists has a UNIQUE constraint, write SQL to identify duplicates (same customer_id + product_id) and mark all but the earliest using ROW_NUMBER.
Hint 1: - ROW_NUMBER() OVER (PARTITION BY customer_id, product_id ORDER BY created_at)
- rn > 1 marks records to delete
Answer
WITH ranked AS (
SELECT
id,
customer_id,
product_id,
created_at,
ROW_NUMBER() OVER (
PARTITION BY customer_id, product_id
ORDER BY created_at ASC
) AS rn
FROM wishlists
)
SELECT id, customer_id, product_id, created_at, rn,
CASE WHEN rn > 1 THEN 'DELETE' ELSE 'KEEP' END AS action
FROM ranked
WHERE customer_id IN (
SELECT customer_id FROM wishlists
GROUP BY customer_id
HAVING COUNT(*) > 1
)
ORDER BY customer_id, product_id, rn
LIMIT 30;
Result (top 7 of 30 rows)
| id | customer_id | product_id | created_at | rn | action |
|---|---|---|---|---|---|
| 88 | 3 | 142 | 2020-10-21 05:28:28 | 1 | KEEP |
| 1202 | 3 | 164 | 2018-12-05 10:45:35 | 1 | KEEP |
| 1456 | 3 | 234 | 2025-06-24 10:23:47 | 1 | KEEP |
| 996 | 19 | 4 | 2017-10-28 18:51:00 | 1 | KEEP |
| 82 | 19 | 144 | 2024-09-29 00:02:22 | 1 | KEEP |
| 1378 | 81 | 1 | 2017-10-25 09:20:52 | 1 | KEEP |
| 1997 | 81 | 106 | 2023-02-11 22:00:20 | 1 | KEEP |
11. Signup-to-Order Conversion Rate by Channel
Calculate signup count, first-order count, and conversion rate by acquisition_channel.
Hint 1: - Group by customers.acquisition_channel
- Customer is converted if they have at least 1 order
Answer
SELECT
COALESCE(c.acquisition_channel, 'unknown') AS channel,
COUNT(*) AS signup_count,
COUNT(DISTINCT o.customer_id) AS converted_count,
ROUND(100.0 * COUNT(DISTINCT o.customer_id) / COUNT(*), 1) AS conversion_rate_pct
FROM customers AS c
LEFT JOIN orders AS o
ON c.id = o.customer_id
AND o.status NOT IN ('cancelled', 'returned', 'return_requested')
GROUP BY COALESCE(c.acquisition_channel, 'unknown')
ORDER BY conversion_rate_pct DESC;
Result (5 rows)
| channel | signup_count | converted_count | conversion_rate_pct |
|---|---|---|---|
| social | 8685 | 741 | 8.50 |
| search_ad | 10,504 | 838 | 8.00 |
| referral | 5134 | 384 | 7.50 |
| direct | 3156 | 210 | 6.70 |
| organic | 9656 | 620 | 6.40 |
12. Flow vs Stock Comparison
Compare inventory flow (sum of inbound - outbound from transactions) with current stock (products.stock_qty) per product. Identify products where they differ.
Hint 1: - SUM(quantity) from inventory_transactions (positive=inbound, negative=outbound)
- Compare with products.stock_qty
Answer
WITH flow AS (
SELECT
product_id,
SUM(quantity) AS net_flow
FROM inventory_transactions
GROUP BY product_id
)
SELECT
p.name AS product_name,
p.stock_qty AS current_stock,
f.net_flow AS calculated_stock,
p.stock_qty - f.net_flow AS discrepancy
FROM products AS p
INNER JOIN flow AS f ON p.id = f.product_id
WHERE p.stock_qty != f.net_flow
ORDER BY ABS(p.stock_qty - f.net_flow) DESC
LIMIT 20;
Result (top 7 of 20 rows)
| product_name | current_stock | calculated_stock | discrepancy |
|---|---|---|---|
| Netgear GS316PP | 104 | 1590 | -1486 |
| TP-Link TL-SG1016D Silver | 275 | 1569 | -1294 |
| LG 27UQ85R Black | 26 | 1280 | -1254 |
| CORSAIR RM1000x White | 58 | 1262 | -1204 |
| Sony WH-CH720N Silver | 89 | 1259 | -1170 |
| Razer Basilisk V3 Pro 35K White | 99 | 1268 | -1169 |
| Logitech G PRO X SUPERLIGHT 2 White | 152 | 1300 | -1148 |
13. Consecutive Same-Status Orders
Find customers with 3+ consecutive orders having the same status. Use LAG to compare with the previous 2 orders' status.
Hint 1: - Use LAG(status, 1) and LAG(status, 2)
- Show rows where all 3 have the same status
Answer
WITH order_seq AS (
SELECT
customer_id,
order_number,
status,
ordered_at,
LAG(status, 1) OVER (PARTITION BY customer_id ORDER BY ordered_at) AS prev_1,
LAG(status, 2) OVER (PARTITION BY customer_id ORDER BY ordered_at) AS prev_2
FROM orders
)
SELECT
os.customer_id,
c.name AS customer_name,
os.order_number,
os.status,
os.prev_1,
os.prev_2,
os.ordered_at
FROM order_seq AS os
INNER JOIN customers AS c ON os.customer_id = c.id
WHERE os.status = os.prev_1
AND os.status = os.prev_2
ORDER BY os.customer_id, os.ordered_at
LIMIT 30;
Result (top 7 of 30 rows)
| customer_id | customer_name | order_number | status | prev_1 | prev_2 | ordered_at |
|---|---|---|---|---|---|---|
| 2 | Danny Johnson | ORD-20160830-00269 | confirmed | confirmed | confirmed | 2016-08-30 10:49:39 |
| 2 | Danny Johnson | ORD-20160904-00274 | confirmed | confirmed | confirmed | 2016-09-04 08:47:04 |
| 2 | Danny Johnson | ORD-20160915-00287 | confirmed | confirmed | confirmed | 2016-09-15 20:07:17 |
| 2 | Danny Johnson | ORD-20161024-00334 | confirmed | confirmed | confirmed | 2016-10-24 12:13:06 |
| 2 | Danny Johnson | ORD-20161101-00343 | confirmed | confirmed | confirmed | 2016-11-01 10:44:08 |
| 2 | Danny Johnson | ORD-20170122-00444 | confirmed | confirmed | confirmed | 2017-01-22 08:39:07 |
| 2 | Danny Johnson | ORD-20170305-00501 | confirmed | confirmed | confirmed | 2017-03-05 20:35:01 |
14. Top 3 Products by Revenue per Category
Use ROW_NUMBER to find the top 3 products by revenue per category. Show category, rank, product name, and revenue.
Hint 1: - ROW_NUMBER() OVER (PARTITION BY cat.id ORDER BY revenue DESC)
- WHERE rn <= 3
Answer
WITH product_revenue AS (
SELECT
cat.name AS category,
cat.id AS category_id,
p.name AS product_name,
ROUND(SUM(oi.quantity * oi.unit_price), 2) AS revenue,
ROW_NUMBER() OVER (
PARTITION BY cat.id
ORDER BY SUM(oi.quantity * oi.unit_price) DESC
) AS rn
FROM order_items AS oi
INNER JOIN orders AS o ON oi.order_id = o.id
INNER JOIN products AS p ON oi.product_id = p.id
INNER JOIN categories AS cat ON p.category_id = cat.id
WHERE o.status NOT IN ('cancelled', 'returned', 'return_requested')
GROUP BY cat.id, cat.name, p.id, p.name
)
SELECT category, rn AS rank, product_name, revenue
FROM product_revenue
WHERE rn <= 3
ORDER BY category, rn;
Result (top 7 of 113 rows)
| category | rank | product_name | revenue |
|---|---|---|---|
| 2-in-1 | 1 | Lenovo ThinkPad X1 2in1 Silver | 554,231,700.00 |
| 2-in-1 | 2 | HP Envy x360 15 Silver | 326,727,400.00 |
| 2-in-1 | 3 | HP Pavilion x360 14 Black | 319,615,200.00 |
| AMD | 1 | AMD Ryzen 9 9900X | 452,187,900.00 |
| AMD | 1 | MSI Radeon RX 7900 XTX GAMING X White | 585,793,600.00 |
| AMD | 2 | AMD Ryzen 9 9900X | 149,725,400.00 |
| AMD | 2 | ASUS Dual RX 9070 Silver | 515,058,400.00 |
15. Mentoring Pair Matching
Match junior staff (role='staff') with seniors (role='manager') in the same department as mentor-mentee pairs. Show department, mentee name, and mentor name.
Hint 1: - Self-join staff: s1.department = s2.department
- s1.role = 'staff' AND s2.role = 'manager'
Answer
SELECT
s1.department,
s1.name AS mentee,
s1.role AS mentee_role,
s2.name AS mentor,
s2.role AS mentor_role
FROM staff AS s1
INNER JOIN staff AS s2
ON s1.department = s2.department
AND s2.role = 'manager'
WHERE s1.role = 'staff'
AND s1.is_active = 1
AND s2.is_active = 1
ORDER BY s1.department, s1.name;
16. Classic Retention Analysis
By signup month (cohort), calculate the percentage of customers who ordered again in month+1, month+2, and month+3.
Hint 1: - Cohort = SUBSTR(customers.created_at, 1, 7)
- Calculate "months since signup" for each order
- Use COUNT(DISTINCT CASE WHEN ... THEN customer_id END)
Answer
WITH cohort AS (
SELECT
id AS customer_id,
SUBSTR(created_at, 1, 7) AS signup_month,
created_at
FROM customers
WHERE created_at >= '2024-01-01' AND created_at < '2024-07-01'
),
cohort_orders AS (
SELECT
co.signup_month,
co.customer_id,
CAST(
(CAST(SUBSTR(o.ordered_at, 1, 4) AS INTEGER) * 12
+ CAST(SUBSTR(o.ordered_at, 6, 2) AS INTEGER))
- (CAST(SUBSTR(co.created_at, 1, 4) AS INTEGER) * 12
+ CAST(SUBSTR(co.created_at, 6, 2) AS INTEGER))
AS INTEGER) AS months_since_signup
FROM cohort AS co
INNER JOIN orders AS o
ON co.customer_id = o.customer_id
AND o.status NOT IN ('cancelled', 'returned', 'return_requested')
)
SELECT
signup_month,
COUNT(DISTINCT customer_id) AS cohort_size,
COUNT(DISTINCT CASE WHEN months_since_signup = 1 THEN customer_id END) AS m1,
ROUND(100.0 * COUNT(DISTINCT CASE WHEN months_since_signup = 1 THEN customer_id END)
/ COUNT(DISTINCT customer_id), 1) AS m1_pct,
COUNT(DISTINCT CASE WHEN months_since_signup = 2 THEN customer_id END) AS m2,
ROUND(100.0 * COUNT(DISTINCT CASE WHEN months_since_signup = 2 THEN customer_id END)
/ COUNT(DISTINCT customer_id), 1) AS m2_pct,
COUNT(DISTINCT CASE WHEN months_since_signup = 3 THEN customer_id END) AS m3,
ROUND(100.0 * COUNT(DISTINCT CASE WHEN months_since_signup = 3 THEN customer_id END)
/ COUNT(DISTINCT customer_id), 1) AS m3_pct
FROM cohort_orders
GROUP BY signup_month
ORDER BY signup_month;
WITH cohort AS (
SELECT
id AS customer_id,
SUBSTR(created_at, 1, 7) AS signup_month,
created_at
FROM customers
WHERE created_at >= '2024-01-01' AND created_at < '2024-07-01'
),
cohort_orders AS (
SELECT
co.signup_month,
co.customer_id,
(CAST(SUBSTR(o.ordered_at, 1, 4) AS NUMBER) * 12
+ CAST(SUBSTR(o.ordered_at, 6, 2) AS NUMBER))
- (CAST(SUBSTR(co.created_at, 1, 4) AS NUMBER) * 12
+ CAST(SUBSTR(co.created_at, 6, 2) AS NUMBER)) AS months_since_signup
FROM cohort co
INNER JOIN orders o
ON co.customer_id = o.customer_id
AND o.status NOT IN ('cancelled', 'returned', 'return_requested')
)
SELECT
signup_month,
COUNT(DISTINCT customer_id) AS cohort_size,
COUNT(DISTINCT CASE WHEN months_since_signup = 1 THEN customer_id END) AS m1,
ROUND(100.0 * COUNT(DISTINCT CASE WHEN months_since_signup = 1 THEN customer_id END)
/ COUNT(DISTINCT customer_id), 1) AS m1_pct,
COUNT(DISTINCT CASE WHEN months_since_signup = 2 THEN customer_id END) AS m2,
ROUND(100.0 * COUNT(DISTINCT CASE WHEN months_since_signup = 2 THEN customer_id END)
/ COUNT(DISTINCT customer_id), 1) AS m2_pct,
COUNT(DISTINCT CASE WHEN months_since_signup = 3 THEN customer_id END) AS m3,
ROUND(100.0 * COUNT(DISTINCT CASE WHEN months_since_signup = 3 THEN customer_id END)
/ COUNT(DISTINCT customer_id), 1) AS m3_pct
FROM cohort_orders
GROUP BY signup_month
ORDER BY signup_month;
WITH cohort AS (
SELECT
id AS customer_id,
SUBSTRING(created_at, 1, 7) AS signup_month,
created_at
FROM customers
WHERE created_at >= '2024-01-01' AND created_at < '2024-07-01'
),
cohort_orders AS (
SELECT
co.signup_month,
co.customer_id,
(CAST(SUBSTRING(o.ordered_at, 1, 4) AS INT) * 12
+ CAST(SUBSTRING(o.ordered_at, 6, 2) AS INT))
- (CAST(SUBSTRING(co.created_at, 1, 4) AS INT) * 12
+ CAST(SUBSTRING(co.created_at, 6, 2) AS INT)) AS months_since_signup
FROM cohort AS co
INNER JOIN orders AS o
ON co.customer_id = o.customer_id
AND o.status NOT IN ('cancelled', 'returned', 'return_requested')
)
SELECT
signup_month,
COUNT(DISTINCT customer_id) AS cohort_size,
COUNT(DISTINCT CASE WHEN months_since_signup = 1 THEN customer_id END) AS m1,
ROUND(100.0 * COUNT(DISTINCT CASE WHEN months_since_signup = 1 THEN customer_id END)
/ COUNT(DISTINCT customer_id), 1) AS m1_pct,
COUNT(DISTINCT CASE WHEN months_since_signup = 2 THEN customer_id END) AS m2,
ROUND(100.0 * COUNT(DISTINCT CASE WHEN months_since_signup = 2 THEN customer_id END)
/ COUNT(DISTINCT customer_id), 1) AS m2_pct,
COUNT(DISTINCT CASE WHEN months_since_signup = 3 THEN customer_id END) AS m3,
ROUND(100.0 * COUNT(DISTINCT CASE WHEN months_since_signup = 3 THEN customer_id END)
/ COUNT(DISTINCT customer_id), 1) AS m3_pct
FROM cohort_orders
GROUP BY signup_month
ORDER BY signup_month;
Result (6 rows)
| signup_month | cohort_size | m1 | m1_pct | m2 | m2_pct | m3 | m3_pct |
|---|---|---|---|---|---|---|---|
| 2024-01 | 28 | 3 | 10.70 | 5 | 17.90 | 4 | 14.30 |
| 2024-02 | 26 | 7 | 26.90 | 3 | 11.50 | 5 | 19.20 |
| 2024-03 | 44 | 10 | 22.70 | 11 | 25.00 | 2 | 4.50 |
| 2024-04 | 26 | 4 | 15.40 | 4 | 15.40 | 2 | 7.70 |
| 2024-05 | 24 | 4 | 16.70 | 7 | 29.20 | 2 | 8.30 |
| 2024-06 | 36 | 4 | 11.10 | 4 | 11.10 | 6 | 16.70 |
17. Rolling Retention Analysis
By signup month cohort, calculate the % of customers who ordered in ANY month after N months (not just exactly month N).
Hint 1: - Use months_since_signup >= N for "any time after N months"
- Always >= classic retention values
Answer
WITH cohort AS (
SELECT
id AS customer_id,
SUBSTR(created_at, 1, 7) AS signup_month,
created_at
FROM customers
WHERE created_at >= '2024-01-01' AND created_at < '2024-07-01'
),
cohort_orders AS (
SELECT
co.signup_month,
co.customer_id,
CAST(
(CAST(SUBSTR(o.ordered_at, 1, 4) AS INTEGER) * 12
+ CAST(SUBSTR(o.ordered_at, 6, 2) AS INTEGER))
- (CAST(SUBSTR(co.created_at, 1, 4) AS INTEGER) * 12
+ CAST(SUBSTR(co.created_at, 6, 2) AS INTEGER))
AS INTEGER) AS months_since_signup
FROM cohort AS co
INNER JOIN orders AS o
ON co.customer_id = o.customer_id
AND o.status NOT IN ('cancelled', 'returned', 'return_requested')
)
SELECT
signup_month,
COUNT(DISTINCT customer_id) AS cohort_size,
COUNT(DISTINCT CASE WHEN months_since_signup >= 1 THEN customer_id END) AS rolling_m1,
ROUND(100.0 * COUNT(DISTINCT CASE WHEN months_since_signup >= 1 THEN customer_id END)
/ COUNT(DISTINCT customer_id), 1) AS rolling_m1_pct,
COUNT(DISTINCT CASE WHEN months_since_signup >= 2 THEN customer_id END) AS rolling_m2,
ROUND(100.0 * COUNT(DISTINCT CASE WHEN months_since_signup >= 2 THEN customer_id END)
/ COUNT(DISTINCT customer_id), 1) AS rolling_m2_pct,
COUNT(DISTINCT CASE WHEN months_since_signup >= 3 THEN customer_id END) AS rolling_m3,
ROUND(100.0 * COUNT(DISTINCT CASE WHEN months_since_signup >= 3 THEN customer_id END)
/ COUNT(DISTINCT customer_id), 1) AS rolling_m3_pct
FROM cohort_orders
GROUP BY signup_month
ORDER BY signup_month;
WITH cohort AS (
SELECT
id AS customer_id,
SUBSTR(created_at, 1, 7) AS signup_month,
created_at
FROM customers
WHERE created_at >= '2024-01-01' AND created_at < '2024-07-01'
),
cohort_orders AS (
SELECT
co.signup_month,
co.customer_id,
(CAST(SUBSTR(o.ordered_at, 1, 4) AS NUMBER) * 12
+ CAST(SUBSTR(o.ordered_at, 6, 2) AS NUMBER))
- (CAST(SUBSTR(co.created_at, 1, 4) AS NUMBER) * 12
+ CAST(SUBSTR(co.created_at, 6, 2) AS NUMBER)) AS months_since_signup
FROM cohort co
INNER JOIN orders o
ON co.customer_id = o.customer_id
AND o.status NOT IN ('cancelled', 'returned', 'return_requested')
)
SELECT
signup_month,
COUNT(DISTINCT customer_id) AS cohort_size,
COUNT(DISTINCT CASE WHEN months_since_signup >= 1 THEN customer_id END) AS rolling_m1,
ROUND(100.0 * COUNT(DISTINCT CASE WHEN months_since_signup >= 1 THEN customer_id END)
/ COUNT(DISTINCT customer_id), 1) AS rolling_m1_pct,
COUNT(DISTINCT CASE WHEN months_since_signup >= 2 THEN customer_id END) AS rolling_m2,
ROUND(100.0 * COUNT(DISTINCT CASE WHEN months_since_signup >= 2 THEN customer_id END)
/ COUNT(DISTINCT customer_id), 1) AS rolling_m2_pct,
COUNT(DISTINCT CASE WHEN months_since_signup >= 3 THEN customer_id END) AS rolling_m3,
ROUND(100.0 * COUNT(DISTINCT CASE WHEN months_since_signup >= 3 THEN customer_id END)
/ COUNT(DISTINCT customer_id), 1) AS rolling_m3_pct
FROM cohort_orders
GROUP BY signup_month
ORDER BY signup_month;
WITH cohort AS (
SELECT
id AS customer_id,
SUBSTRING(created_at, 1, 7) AS signup_month,
created_at
FROM customers
WHERE created_at >= '2024-01-01' AND created_at < '2024-07-01'
),
cohort_orders AS (
SELECT
co.signup_month,
co.customer_id,
(CAST(SUBSTRING(o.ordered_at, 1, 4) AS INT) * 12
+ CAST(SUBSTRING(o.ordered_at, 6, 2) AS INT))
- (CAST(SUBSTRING(co.created_at, 1, 4) AS INT) * 12
+ CAST(SUBSTRING(co.created_at, 6, 2) AS INT)) AS months_since_signup
FROM cohort AS co
INNER JOIN orders AS o
ON co.customer_id = o.customer_id
AND o.status NOT IN ('cancelled', 'returned', 'return_requested')
)
SELECT
signup_month,
COUNT(DISTINCT customer_id) AS cohort_size,
COUNT(DISTINCT CASE WHEN months_since_signup >= 1 THEN customer_id END) AS rolling_m1,
ROUND(100.0 * COUNT(DISTINCT CASE WHEN months_since_signup >= 1 THEN customer_id END)
/ COUNT(DISTINCT customer_id), 1) AS rolling_m1_pct,
COUNT(DISTINCT CASE WHEN months_since_signup >= 2 THEN customer_id END) AS rolling_m2,
ROUND(100.0 * COUNT(DISTINCT CASE WHEN months_since_signup >= 2 THEN customer_id END)
/ COUNT(DISTINCT customer_id), 1) AS rolling_m2_pct,
COUNT(DISTINCT CASE WHEN months_since_signup >= 3 THEN customer_id END) AS rolling_m3,
ROUND(100.0 * COUNT(DISTINCT CASE WHEN months_since_signup >= 3 THEN customer_id END)
/ COUNT(DISTINCT customer_id), 1) AS rolling_m3_pct
FROM cohort_orders
GROUP BY signup_month
ORDER BY signup_month;
Result (6 rows)
| signup_month | cohort_size | rolling_m1 | rolling_m1_pct | rolling_m2 | rolling_m2_pct | rolling_m3 | rolling_m3_pct |
|---|---|---|---|---|---|---|---|
| 2024-01 | 28 | 28 | 100.00 | 28 | 100.00 | 26 | 92.90 |
| 2024-02 | 26 | 26 | 100.00 | 26 | 100.00 | 25 | 96.20 |
| 2024-03 | 44 | 43 | 97.70 | 43 | 97.70 | 42 | 95.50 |
| 2024-04 | 26 | 26 | 100.00 | 24 | 92.30 | 24 | 92.30 |
| 2024-05 | 24 | 24 | 100.00 | 24 | 100.00 | 24 | 100.00 |
| 2024-06 | 36 | 35 | 97.20 | 35 | 97.20 | 33 | 91.70 |
18. DAU/MAU Stickiness Ratio
For December 2024, calculate daily active users (DAU) and monthly active users (MAU) from product_views. DAU/MAU ratio is the stickiness metric.
Hint 1: - DAU = daily COUNT(DISTINCT customer_id)
- MAU = monthly COUNT(DISTINCT customer_id) (subquery)
- Stickiness = DAU / MAU
Answer
WITH dau AS (
SELECT
SUBSTR(viewed_at, 1, 10) AS view_date,
COUNT(DISTINCT customer_id) AS daily_active
FROM product_views
WHERE viewed_at LIKE '2024-12%'
GROUP BY SUBSTR(viewed_at, 1, 10)
),
mau AS (
SELECT COUNT(DISTINCT customer_id) AS monthly_active
FROM product_views
WHERE viewed_at LIKE '2024-12%'
)
SELECT
d.view_date,
d.daily_active AS dau,
m.monthly_active AS mau,
ROUND(100.0 * d.daily_active / m.monthly_active, 2) AS stickiness_pct
FROM dau AS d
CROSS JOIN mau AS m
ORDER BY d.view_date;
Result (top 7 of 31 rows)
| view_date | dau | mau | stickiness_pct |
|---|---|---|---|
| 2024-12-01 | 94 | 1056 | 8.90 |
| 2024-12-02 | 118 | 1056 | 11.17 |
| 2024-12-03 | 111 | 1056 | 10.51 |
| 2024-12-04 | 106 | 1056 | 10.04 |
| 2024-12-05 | 109 | 1056 | 10.32 |
| 2024-12-06 | 111 | 1056 | 10.51 |
| 2024-12-07 | 102 | 1056 | 9.66 |
19. 7-Day Moving Average Revenue
Calculate daily revenue and 7-day moving average for December 2024.
Use AVG() OVER (ROWS BETWEEN 6 PRECEDING AND CURRENT ROW).
Hint 1: - Aggregate daily revenue first
- AVG(daily_revenue) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
Answer
WITH daily_revenue AS (
SELECT
SUBSTR(ordered_at, 1, 10) AS order_date,
ROUND(SUM(total_amount), 2) AS revenue
FROM orders
WHERE ordered_at LIKE '2024-12%'
AND status NOT IN ('cancelled', 'returned', 'return_requested')
GROUP BY SUBSTR(ordered_at, 1, 10)
)
SELECT
order_date,
revenue,
ROUND(AVG(revenue) OVER (
ORDER BY order_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
), 2) AS moving_avg_7d
FROM daily_revenue
ORDER BY order_date;
Result (top 7 of 31 rows)
| order_date | revenue | moving_avg_7d |
|---|---|---|
| 2024-12-01 | 10,287,445.00 | 10,287,445.00 |
| 2024-12-02 | 11,732,557.00 | 11,010,001.00 |
| 2024-12-03 | 11,867,860.00 | 11,295,954.00 |
| 2024-12-04 | 11,198,303.00 | 11,271,541.25 |
| 2024-12-05 | 5,489,585.00 | 10,115,150.00 |
| 2024-12-06 | 15,937,500.00 | 11,085,541.67 |
| 2024-12-07 | 15,895,514.00 | 11,772,680.57 |
20. 3-Month Moving Average Monthly Revenue
Calculate monthly revenue with a 3-month moving average. Use the last 24 months of data.
Hint 1: - AVG(revenue) OVER (ORDER BY month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
Answer
WITH monthly AS (
SELECT
SUBSTR(ordered_at, 1, 7) AS year_month,
ROUND(SUM(total_amount), 0) AS revenue
FROM orders
WHERE status NOT IN ('cancelled', 'returned', 'return_requested')
AND ordered_at >= '2024-01-01'
GROUP BY SUBSTR(ordered_at, 1, 7)
)
SELECT
year_month,
revenue,
ROUND(AVG(revenue) OVER (
ORDER BY year_month
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
), 0) AS moving_avg_3m
FROM monthly
ORDER BY year_month;
Result (top 7 of 24 rows)
| year_month | revenue | moving_avg_3m |
|---|---|---|
| 2024-01 | 288,908,320.00 | 288,908,320.00 |
| 2024-02 | 403,127,749.00 | 346,018,035.00 |
| 2024-03 | 519,844,502.00 | 403,960,190.00 |
| 2024-04 | 451,877,581.00 | 458,283,277.00 |
| 2024-05 | 425,264,478.00 | 465,662,187.00 |
| 2024-06 | 362,715,211.00 | 413,285,757.00 |
| 2024-07 | 343,929,897.00 | 377,303,195.00 |
21. Cumulative Monthly Revenue Within Year
Calculate monthly revenue and cumulative revenue within each year. Use PARTITION BY year so cumulative resets each year.
Hint 1: - SUM(revenue) OVER (PARTITION BY year ORDER BY month)
Answer
SELECT
SUBSTR(ordered_at, 1, 4) AS year,
SUBSTR(ordered_at, 1, 7) AS year_month,
ROUND(SUM(total_amount), 0) AS monthly_revenue,
ROUND(SUM(SUM(total_amount)) OVER (
PARTITION BY SUBSTR(ordered_at, 1, 4)
ORDER BY SUBSTR(ordered_at, 1, 7)
), 0) AS cumulative_revenue
FROM orders
WHERE status NOT IN ('cancelled', 'returned', 'return_requested')
AND ordered_at >= '2023-01-01'
GROUP BY SUBSTR(ordered_at, 1, 4), SUBSTR(ordered_at, 1, 7)
ORDER BY year_month;
Result (top 7 of 36 rows)
| year | year_month | monthly_revenue | cumulative_revenue |
|---|---|---|---|
| 2023 | 2023-01 | 270,083,587.00 | 270,083,587.00 |
| 2023 | 2023-02 | 327,431,648.00 | 597,515,235.00 |
| 2023 | 2023-03 | 477,735,354.00 | 1,075,250,589.00 |
| 2023 | 2023-04 | 396,849,049.00 | 1,472,099,638.00 |
| 2023 | 2023-05 | 349,749,072.00 | 1,821,848,710.00 |
| 2023 | 2023-06 | 279,698,633.00 | 2,101,547,343.00 |
| 2023 | 2023-07 | 312,983,148.00 | 2,414,530,491.00 |
22. Pareto Analysis (Customers)
Analyze what percentage of customers generate 80% of revenue. Show cumulative revenue percentage and customer rank.
Hint 1: - Sort customer revenue descending, calculate cumulative SUM - Calculate cumulative % of total revenue - Find customer count at 80% threshold / total customers
Answer
WITH customer_revenue AS (
SELECT
customer_id,
ROUND(SUM(total_amount), 2) AS revenue
FROM orders
WHERE status NOT IN ('cancelled', 'returned', 'return_requested')
GROUP BY customer_id
),
ranked AS (
SELECT
customer_id,
revenue,
SUM(revenue) OVER (ORDER BY revenue DESC) AS cumulative_revenue,
SUM(revenue) OVER () AS total_revenue,
ROW_NUMBER() OVER (ORDER BY revenue DESC) AS rank,
COUNT(*) OVER () AS total_customers
FROM customer_revenue
)
SELECT
rank,
revenue,
ROUND(100.0 * cumulative_revenue / total_revenue, 2) AS cumulative_pct,
ROUND(100.0 * rank / total_customers, 2) AS customer_pct,
CASE
WHEN 100.0 * cumulative_revenue / total_revenue <= 80 THEN 'Top 80%'
ELSE 'Remaining'
END AS pareto_group
FROM ranked
WHERE rank <= 50 OR 100.0 * cumulative_revenue / total_revenue BETWEEN 78 AND 82
ORDER BY rank;
Result (top 7 of 167 rows)
| rank | revenue | cumulative_pct | customer_pct | pareto_group |
|---|---|---|---|---|
| 1 | 403,448,758.00 | 1.16 | 0.04 | Top 80% |
| 2 | 366,385,931.00 | 2.21 | 0.07 | Top 80% |
| 3 | 253,180,338.00 | 2.94 | 0.11 | Top 80% |
| 4 | 244,604,910.00 | 3.65 | 0.14 | Top 80% |
| 5 | 235,775,349.00 | 4.32 | 0.18 | Top 80% |
| 6 | 234,708,853.00 | 5.00 | 0.21 | Top 80% |
| 7 | 230,165,991.00 | 5.66 | 0.25 | Top 80% |
23. Purchase Cycle Analysis
Calculate the average days between consecutive orders per customer. Use LAG to get previous order date and compute JULIANDAY difference.
Hint 1: - LAG(ordered_at) OVER (PARTITION BY customer_id ORDER BY ordered_at)
- JULIANDAY(ordered_at) - JULIANDAY(prev_ordered_at)
- AVG() for average cycle per customer
Answer
WITH order_gaps AS (
SELECT
customer_id,
ordered_at,
LAG(ordered_at) OVER (
PARTITION BY customer_id ORDER BY ordered_at
) AS prev_ordered_at
FROM orders
WHERE status NOT IN ('cancelled', 'returned', 'return_requested')
)
SELECT
c.name AS customer_name,
c.grade,
COUNT(*) AS gap_count,
ROUND(AVG(JULIANDAY(og.ordered_at) - JULIANDAY(og.prev_ordered_at)), 1) AS avg_cycle_days,
MIN(CAST(JULIANDAY(og.ordered_at) - JULIANDAY(og.prev_ordered_at) AS INTEGER)) AS min_days,
MAX(CAST(JULIANDAY(og.ordered_at) - JULIANDAY(og.prev_ordered_at) AS INTEGER)) AS max_days
FROM order_gaps AS og
INNER JOIN customers AS c ON og.customer_id = c.id
WHERE og.prev_ordered_at IS NOT NULL
GROUP BY og.customer_id, c.name, c.grade
HAVING COUNT(*) >= 3
ORDER BY avg_cycle_days ASC
LIMIT 20;
WITH order_gaps AS (
SELECT
customer_id,
ordered_at,
LAG(ordered_at) OVER (
PARTITION BY customer_id ORDER BY ordered_at
) AS prev_ordered_at
FROM orders
WHERE status NOT IN ('cancelled', 'returned', 'return_requested')
)
SELECT
c.name AS customer_name,
c.grade,
COUNT(*) AS gap_count,
ROUND(AVG(CAST(og.ordered_at AS DATE) - CAST(og.prev_ordered_at AS DATE)), 1) AS avg_cycle_days,
MIN(CAST(og.ordered_at AS DATE) - CAST(og.prev_ordered_at AS DATE)) AS min_days,
MAX(CAST(og.ordered_at AS DATE) - CAST(og.prev_ordered_at AS DATE)) AS max_days
FROM order_gaps og
INNER JOIN customers c ON og.customer_id = c.id
WHERE og.prev_ordered_at IS NOT NULL
GROUP BY og.customer_id, c.name, c.grade
HAVING COUNT(*) >= 3
ORDER BY avg_cycle_days ASC
FETCH FIRST 20 ROWS ONLY;
WITH order_gaps AS (
SELECT
customer_id,
ordered_at,
LAG(ordered_at) OVER (
PARTITION BY customer_id ORDER BY ordered_at
) AS prev_ordered_at
FROM orders
WHERE status NOT IN ('cancelled', 'returned', 'return_requested')
)
SELECT TOP 20
c.name AS customer_name,
c.grade,
COUNT(*) AS gap_count,
ROUND(AVG(CAST(DATEDIFF(DAY, og.prev_ordered_at, og.ordered_at) AS FLOAT)), 1) AS avg_cycle_days,
MIN(DATEDIFF(DAY, og.prev_ordered_at, og.ordered_at)) AS min_days,
MAX(DATEDIFF(DAY, og.prev_ordered_at, og.ordered_at)) AS max_days
FROM order_gaps AS og
INNER JOIN customers AS c ON og.customer_id = c.id
WHERE og.prev_ordered_at IS NOT NULL
GROUP BY og.customer_id, c.name, c.grade
HAVING COUNT(*) >= 3
ORDER BY avg_cycle_days ASC;
Result (top 7 of 20 rows)
| customer_name | grade | gap_count | avg_cycle_days | min_days | max_days |
|---|---|---|---|---|---|
| Jason Rivera | VIP | 341 | 10.10 | 0 | 112 |
| Allen Snyder | VIP | 302 | 10.80 | 0 | 88 |
| Ronald Arellano | VIP | 218 | 12.00 | 0 | 74 |
| Jennifer Bradshaw | VIP | 6 | 12.10 | 0 | 26 |
| Courtney Huff | VIP | 222 | 12.30 | 0 | 125 |
| Julia Carson | GOLD | 3 | 12.40 | 3 | 20 |
| Brenda Garcia | VIP | 248 | 12.90 | 0 | 102 |
24. First Purchase Repurchase Rate
Calculate the % of customers who repurchased within 30, 60, and 90 days of their first order.
Hint 1: - First order = MIN(ordered_at) per customer
- Check if second order is within first_order + N days
Answer
WITH first_order AS (
SELECT
customer_id,
MIN(ordered_at) AS first_ordered_at
FROM orders
WHERE status NOT IN ('cancelled', 'returned', 'return_requested')
GROUP BY customer_id
),
repeat_order AS (
SELECT
fo.customer_id,
fo.first_ordered_at,
MIN(o.ordered_at) AS second_ordered_at
FROM first_order AS fo
INNER JOIN orders AS o
ON fo.customer_id = o.customer_id
AND o.ordered_at > fo.first_ordered_at
AND o.status NOT IN ('cancelled', 'returned', 'return_requested')
GROUP BY fo.customer_id, fo.first_ordered_at
)
SELECT
COUNT(DISTINCT fo.customer_id) AS total_customers,
COUNT(DISTINCT CASE
WHEN JULIANDAY(ro.second_ordered_at) - JULIANDAY(fo.first_ordered_at) <= 30
THEN fo.customer_id
END) AS repurchase_30d,
ROUND(100.0 * COUNT(DISTINCT CASE
WHEN JULIANDAY(ro.second_ordered_at) - JULIANDAY(fo.first_ordered_at) <= 30
THEN fo.customer_id
END) / COUNT(DISTINCT fo.customer_id), 1) AS repurchase_30d_pct,
COUNT(DISTINCT CASE
WHEN JULIANDAY(ro.second_ordered_at) - JULIANDAY(fo.first_ordered_at) <= 60
THEN fo.customer_id
END) AS repurchase_60d,
ROUND(100.0 * COUNT(DISTINCT CASE
WHEN JULIANDAY(ro.second_ordered_at) - JULIANDAY(fo.first_ordered_at) <= 60
THEN fo.customer_id
END) / COUNT(DISTINCT fo.customer_id), 1) AS repurchase_60d_pct,
COUNT(DISTINCT CASE
WHEN JULIANDAY(ro.second_ordered_at) - JULIANDAY(fo.first_ordered_at) <= 90
THEN fo.customer_id
END) AS repurchase_90d,
ROUND(100.0 * COUNT(DISTINCT CASE
WHEN JULIANDAY(ro.second_ordered_at) - JULIANDAY(fo.first_ordered_at) <= 90
THEN fo.customer_id
END) / COUNT(DISTINCT fo.customer_id), 1) AS repurchase_90d_pct
FROM first_order AS fo
LEFT JOIN repeat_order AS ro ON fo.customer_id = ro.customer_id;
WITH first_order AS (
SELECT
customer_id,
MIN(ordered_at) AS first_ordered_at
FROM orders
WHERE status NOT IN ('cancelled', 'returned', 'return_requested')
GROUP BY customer_id
),
repeat_order AS (
SELECT
fo.customer_id,
fo.first_ordered_at,
MIN(o.ordered_at) AS second_ordered_at
FROM first_order fo
INNER JOIN orders o
ON fo.customer_id = o.customer_id
AND o.ordered_at > fo.first_ordered_at
AND o.status NOT IN ('cancelled', 'returned', 'return_requested')
GROUP BY fo.customer_id, fo.first_ordered_at
)
SELECT
COUNT(DISTINCT fo.customer_id) AS total_customers,
COUNT(DISTINCT CASE
WHEN CAST(ro.second_ordered_at AS DATE) - CAST(fo.first_ordered_at AS DATE) <= 30
THEN fo.customer_id
END) AS repurchase_30d,
ROUND(100.0 * COUNT(DISTINCT CASE
WHEN CAST(ro.second_ordered_at AS DATE) - CAST(fo.first_ordered_at AS DATE) <= 30
THEN fo.customer_id
END) / COUNT(DISTINCT fo.customer_id), 1) AS repurchase_30d_pct,
COUNT(DISTINCT CASE
WHEN CAST(ro.second_ordered_at AS DATE) - CAST(fo.first_ordered_at AS DATE) <= 60
THEN fo.customer_id
END) AS repurchase_60d,
ROUND(100.0 * COUNT(DISTINCT CASE
WHEN CAST(ro.second_ordered_at AS DATE) - CAST(fo.first_ordered_at AS DATE) <= 60
THEN fo.customer_id
END) / COUNT(DISTINCT fo.customer_id), 1) AS repurchase_60d_pct,
COUNT(DISTINCT CASE
WHEN CAST(ro.second_ordered_at AS DATE) - CAST(fo.first_ordered_at AS DATE) <= 90
THEN fo.customer_id
END) AS repurchase_90d,
ROUND(100.0 * COUNT(DISTINCT CASE
WHEN CAST(ro.second_ordered_at AS DATE) - CAST(fo.first_ordered_at AS DATE) <= 90
THEN fo.customer_id
END) / COUNT(DISTINCT fo.customer_id), 1) AS repurchase_90d_pct
FROM first_order fo
LEFT JOIN repeat_order ro ON fo.customer_id = ro.customer_id;
WITH first_order AS (
SELECT
customer_id,
MIN(ordered_at) AS first_ordered_at
FROM orders
WHERE status NOT IN ('cancelled', 'returned', 'return_requested')
GROUP BY customer_id
),
repeat_order AS (
SELECT
fo.customer_id,
fo.first_ordered_at,
MIN(o.ordered_at) AS second_ordered_at
FROM first_order AS fo
INNER JOIN orders AS o
ON fo.customer_id = o.customer_id
AND o.ordered_at > fo.first_ordered_at
AND o.status NOT IN ('cancelled', 'returned', 'return_requested')
GROUP BY fo.customer_id, fo.first_ordered_at
)
SELECT
COUNT(DISTINCT fo.customer_id) AS total_customers,
COUNT(DISTINCT CASE
WHEN DATEDIFF(DAY, fo.first_ordered_at, ro.second_ordered_at) <= 30
THEN fo.customer_id
END) AS repurchase_30d,
ROUND(100.0 * COUNT(DISTINCT CASE
WHEN DATEDIFF(DAY, fo.first_ordered_at, ro.second_ordered_at) <= 30
THEN fo.customer_id
END) / COUNT(DISTINCT fo.customer_id), 1) AS repurchase_30d_pct,
COUNT(DISTINCT CASE
WHEN DATEDIFF(DAY, fo.first_ordered_at, ro.second_ordered_at) <= 60
THEN fo.customer_id
END) AS repurchase_60d,
ROUND(100.0 * COUNT(DISTINCT CASE
WHEN DATEDIFF(DAY, fo.first_ordered_at, ro.second_ordered_at) <= 60
THEN fo.customer_id
END) / COUNT(DISTINCT fo.customer_id), 1) AS repurchase_60d_pct,
COUNT(DISTINCT CASE
WHEN DATEDIFF(DAY, fo.first_ordered_at, ro.second_ordered_at) <= 90
THEN fo.customer_id
END) AS repurchase_90d,
ROUND(100.0 * COUNT(DISTINCT CASE
WHEN DATEDIFF(DAY, fo.first_ordered_at, ro.second_ordered_at) <= 90
THEN fo.customer_id
END) / COUNT(DISTINCT fo.customer_id), 1) AS repurchase_90d_pct
FROM first_order AS fo
LEFT JOIN repeat_order AS ro ON fo.customer_id = ro.customer_id;
Result (1 rows)
| total_customers | repurchase_30d | repurchase_30d_pct | repurchase_60d | repurchase_60d_pct | repurchase_90d | repurchase_90d_pct |
|---|---|---|---|---|---|---|
| 2793 | 704 | 25.20 | 1016 | 36.40 | 1229 | 44.00 |
25. Point Balance Drift Detection
Find customers where the SUM of point_transactions.amount does not match customers.point_balance.
Hint 1: - SUM(pt.amount) per customer
- Compare with customers.point_balance
Answer
WITH point_sum AS (
SELECT
customer_id,
SUM(amount) AS calculated_balance
FROM point_transactions
GROUP BY customer_id
)
SELECT
c.id AS customer_id,
c.name,
c.point_balance AS stored_balance,
COALESCE(ps.calculated_balance, 0) AS calculated_balance,
c.point_balance - COALESCE(ps.calculated_balance, 0) AS drift
FROM customers AS c
LEFT JOIN point_sum AS ps ON c.id = ps.customer_id
WHERE c.point_balance != COALESCE(ps.calculated_balance, 0)
ORDER BY ABS(c.point_balance - COALESCE(ps.calculated_balance, 0)) DESC
LIMIT 20;
Result (top 7 of 20 rows)
| customer_id | name | stored_balance | calculated_balance | drift |
|---|---|---|---|---|
| 97 | Jason Rivera | 3,518,880 | 2,332,397 | 1,186,483 |
| 226 | Allen Snyder | 3,955,828 | 2,863,301 | 1,092,527 |
| 162 | Brenda Garcia | 2,450,166 | 1,521,994 | 928,172 |
| 549 | Ronald Arellano | 2,276,622 | 1,449,259 | 827,363 |
| 227 | James Banks | 2,297,542 | 1,516,187 | 781,355 |
| 3 | Adam Moore | 1,564,015 | 859,898 | 704,117 |
| 98 | Gabriel Walters | 2,218,590 | 1,514,981 | 703,609 |
26. Promotion Lift Analysis
Compare average daily revenue during promotion periods vs outside. Calculate lift percentage per promotion.
Hint 1: - Total revenue during started_at to ended_at / number of days
- Same for non-promotion period
Answer
WITH promo_daily AS (
SELECT
pr.id AS promo_id,
pr.name AS promo_name,
ROUND(SUM(o.total_amount), 0) AS promo_revenue,
CAST(JULIANDAY(pr.ended_at) - JULIANDAY(pr.started_at) + 1 AS INTEGER) AS promo_days
FROM promotions AS pr
INNER JOIN orders AS o
ON o.ordered_at BETWEEN pr.started_at AND pr.ended_at
AND o.status NOT IN ('cancelled', 'returned', 'return_requested')
WHERE pr.started_at >= '2024-01-01'
GROUP BY pr.id, pr.name, pr.started_at, pr.ended_at
),
overall_daily AS (
SELECT
ROUND(SUM(total_amount) / 365.0, 0) AS avg_daily_revenue
FROM orders
WHERE ordered_at LIKE '2024%'
AND status NOT IN ('cancelled', 'returned', 'return_requested')
)
SELECT
pd.promo_name,
pd.promo_revenue,
pd.promo_days,
ROUND(1.0 * pd.promo_revenue / pd.promo_days, 0) AS promo_avg_daily,
od.avg_daily_revenue AS baseline_avg_daily,
ROUND(100.0 * ((1.0 * pd.promo_revenue / pd.promo_days) - od.avg_daily_revenue)
/ od.avg_daily_revenue, 1) AS lift_pct
FROM promo_daily AS pd
CROSS JOIN overall_daily AS od
ORDER BY lift_pct DESC
LIMIT 15;
WITH promo_daily AS (
SELECT
pr.id AS promo_id,
pr.name AS promo_name,
ROUND(SUM(o.total_amount), 0) AS promo_revenue,
CAST(pr.ended_at AS DATE) - CAST(pr.started_at AS DATE) + 1 AS promo_days
FROM promotions pr
INNER JOIN orders o
ON o.ordered_at BETWEEN pr.started_at AND pr.ended_at
AND o.status NOT IN ('cancelled', 'returned', 'return_requested')
WHERE pr.started_at >= '2024-01-01'
GROUP BY pr.id, pr.name, pr.started_at, pr.ended_at
),
overall_daily AS (
SELECT
ROUND(SUM(total_amount) / 365.0, 0) AS avg_daily_revenue
FROM orders
WHERE ordered_at LIKE '2024%'
AND status NOT IN ('cancelled', 'returned', 'return_requested')
)
SELECT
pd.promo_name,
pd.promo_revenue,
pd.promo_days,
ROUND(1.0 * pd.promo_revenue / pd.promo_days, 0) AS promo_avg_daily,
od.avg_daily_revenue AS baseline_avg_daily,
ROUND(100.0 * ((1.0 * pd.promo_revenue / pd.promo_days) - od.avg_daily_revenue)
/ od.avg_daily_revenue, 1) AS lift_pct
FROM promo_daily pd
CROSS JOIN overall_daily od
ORDER BY lift_pct DESC
FETCH FIRST 15 ROWS ONLY;
WITH promo_daily AS (
SELECT
pr.id AS promo_id,
pr.name AS promo_name,
ROUND(SUM(o.total_amount), 0) AS promo_revenue,
DATEDIFF(DAY, pr.started_at, pr.ended_at) + 1 AS promo_days
FROM promotions AS pr
INNER JOIN orders AS o
ON o.ordered_at BETWEEN pr.started_at AND pr.ended_at
AND o.status NOT IN ('cancelled', 'returned', 'return_requested')
WHERE pr.started_at >= '2024-01-01'
GROUP BY pr.id, pr.name, pr.started_at, pr.ended_at
),
overall_daily AS (
SELECT
ROUND(SUM(total_amount) / 365.0, 0) AS avg_daily_revenue
FROM orders
WHERE ordered_at LIKE '2024%'
AND status NOT IN ('cancelled', 'returned', 'return_requested')
)
SELECT TOP 15
pd.promo_name,
pd.promo_revenue,
pd.promo_days,
ROUND(1.0 * pd.promo_revenue / pd.promo_days, 0) AS promo_avg_daily,
od.avg_daily_revenue AS baseline_avg_daily,
ROUND(100.0 * ((1.0 * pd.promo_revenue / pd.promo_days) - od.avg_daily_revenue)
/ od.avg_daily_revenue, 1) AS lift_pct
FROM promo_daily AS pd
CROSS JOIN overall_daily AS od
ORDER BY lift_pct DESC;
Result (top 7 of 15 rows)
| promo_name | promo_revenue | promo_days | promo_avg_daily | baseline_avg_daily | lift_pct |
|---|---|---|---|---|---|
| Year-End Thank You Sale 2025 | 416,811,391.00 | 15 | 27,787,426.00 | 14,012,174.00 | 98.30 |
| Surprise Deal | 74,259,378.00 | 3 | 24,753,126.00 | 14,012,174.00 | 76.70 |
| Spring Sale 2025 | 311,900,316.00 | 15 | 20,793,354.00 | 14,012,174.00 | 48.40 |
| Autumn Gift Sale 2024 | 225,867,067.00 | 11 | 20,533,370.00 | 14,012,174.00 | 46.50 |
| Back to School Laptop Sale 2025 | 373,010,414.00 | 22 | 16,955,019.00 | 14,012,174.00 | 21.00 |
| Gaming Gear Festa 2025 | 131,978,627.00 | 8 | 16,497,328.00 | 14,012,174.00 | 17.70 |
| Back to School Laptop Sale 2024 | 350,818,175.00 | 22 | 15,946,281.00 | 14,012,174.00 | 13.80 |
27. Category Cross-Sell Analysis
Find category pairs most frequently bought together in the same order.
Hint 1: - Get category list per order first - Self-join to generate all pairs (c1.id < c2.id to avoid duplicates)
Answer
WITH order_categories AS (
SELECT DISTINCT
oi.order_id,
p.category_id,
cat.name AS category_name
FROM order_items AS oi
INNER JOIN products AS p ON oi.product_id = p.id
INNER JOIN categories AS cat ON p.category_id = cat.id
INNER JOIN orders AS o ON oi.order_id = o.id
WHERE o.status NOT IN ('cancelled', 'returned', 'return_requested')
)
SELECT
oc1.category_name AS category_1,
oc2.category_name AS category_2,
COUNT(*) AS co_occurrence
FROM order_categories AS oc1
INNER JOIN order_categories AS oc2
ON oc1.order_id = oc2.order_id
AND oc1.category_id < oc2.category_id
GROUP BY oc1.category_name, oc2.category_name
ORDER BY co_occurrence DESC
LIMIT 15;
Result (top 7 of 15 rows)
| category_1 | category_2 | co_occurrence |
|---|---|---|
| Power Supply (PSU) | Case | 3139 |
| SSD | Case | 2999 |
| SSD | Power Supply (PSU) | 2990 |
| AMD | Case | 2209 |
| AMD | Power Supply (PSU) | 2182 |
| DDR5 | Case | 1768 |
| Intel | Case | 1757 |
28. Grade Downgrade Trajectory
Find customers with 2+ consecutive downgrades in grade history. e.g., VIP -> GOLD -> SILVER
Hint 1: - Use LAG(reason) OVER (PARTITION BY customer_id ORDER BY changed_at)
- Find rows where both current and previous reason = 'downgrade'
Answer
WITH grade_seq AS (
SELECT
customer_id,
old_grade,
new_grade,
reason,
changed_at,
LAG(reason) OVER (
PARTITION BY customer_id ORDER BY changed_at
) AS prev_reason,
LAG(old_grade) OVER (
PARTITION BY customer_id ORDER BY changed_at
) AS prev_old_grade
FROM customer_grade_history
)
SELECT
gs.customer_id,
c.name AS customer_name,
gs.prev_old_grade AS grade_before,
gs.old_grade AS grade_mid,
gs.new_grade AS grade_after,
gs.changed_at
FROM grade_seq AS gs
INNER JOIN customers AS c ON gs.customer_id = c.id
WHERE gs.reason = 'downgrade'
AND gs.prev_reason = 'downgrade'
ORDER BY gs.customer_id, gs.changed_at;
Result (top 7 of 292 rows)
| customer_id | customer_name | grade_before | grade_mid | grade_after | changed_at |
|---|---|---|---|---|---|
| 4 | Virginia Steele | VIP | GOLD | SILVER | 2024-01-01 00:00:00 |
| 8 | Tyler Rodriguez | VIP | GOLD | SILVER | 2022-01-01 00:00:00 |
| 10 | John Stark | GOLD | SILVER | BRONZE | 2021-01-01 00:00:00 |
| 12 | Michael Velasquez | VIP | GOLD | SILVER | 2023-01-01 00:00:00 |
| 14 | Martha Murphy | VIP | GOLD | SILVER | 2020-01-01 00:00:00 |
| 14 | Martha Murphy | GOLD | SILVER | BRONZE | 2021-01-01 00:00:00 |
| 15 | Lydia Lawrence | VIP | GOLD | SILVER | 2023-01-01 00:00:00 |
29. Carrier Monthly Delivery Performance with Trend
Show average delivery days by carrier by month, with month-over-month change.
Hint 1: - Delivery days: JULIANDAY(delivered_at) - JULIANDAY(shipped_at)
- Previous month: LAG(avg_days) OVER (PARTITION BY carrier ORDER BY month)
Answer
WITH monthly_carrier AS (
SELECT
carrier,
SUBSTR(shipped_at, 1, 7) AS ship_month,
COUNT(*) AS delivery_count,
ROUND(AVG(JULIANDAY(delivered_at) - JULIANDAY(shipped_at)), 2) AS avg_days
FROM shipping
WHERE delivered_at IS NOT NULL
AND shipped_at IS NOT NULL
AND shipped_at >= '2024-01-01'
GROUP BY carrier, SUBSTR(shipped_at, 1, 7)
)
SELECT
carrier,
ship_month,
delivery_count,
avg_days,
LAG(avg_days) OVER (PARTITION BY carrier ORDER BY ship_month) AS prev_month_days,
ROUND(avg_days - LAG(avg_days) OVER (PARTITION BY carrier ORDER BY ship_month), 2) AS mom_change
FROM monthly_carrier
ORDER BY carrier, ship_month;
WITH monthly_carrier AS (
SELECT
carrier,
SUBSTR(shipped_at, 1, 7) AS ship_month,
COUNT(*) AS delivery_count,
ROUND(AVG(CAST(delivered_at AS DATE) - CAST(shipped_at AS DATE)), 2) AS avg_days
FROM shipping
WHERE delivered_at IS NOT NULL
AND shipped_at IS NOT NULL
AND shipped_at >= '2024-01-01'
GROUP BY carrier, SUBSTR(shipped_at, 1, 7)
)
SELECT
carrier,
ship_month,
delivery_count,
avg_days,
LAG(avg_days) OVER (PARTITION BY carrier ORDER BY ship_month) AS prev_month_days,
ROUND(avg_days - LAG(avg_days) OVER (PARTITION BY carrier ORDER BY ship_month), 2) AS mom_change
FROM monthly_carrier
ORDER BY carrier, ship_month;
WITH monthly_carrier AS (
SELECT
carrier,
SUBSTRING(shipped_at, 1, 7) AS ship_month,
COUNT(*) AS delivery_count,
ROUND(AVG(CAST(DATEDIFF(DAY, shipped_at, delivered_at) AS FLOAT)), 2) AS avg_days
FROM shipping
WHERE delivered_at IS NOT NULL
AND shipped_at IS NOT NULL
AND shipped_at >= '2024-01-01'
GROUP BY carrier, SUBSTRING(shipped_at, 1, 7)
)
SELECT
carrier,
ship_month,
delivery_count,
avg_days,
LAG(avg_days) OVER (PARTITION BY carrier ORDER BY ship_month) AS prev_month_days,
ROUND(avg_days - LAG(avg_days) OVER (PARTITION BY carrier ORDER BY ship_month), 2) AS mom_change
FROM monthly_carrier
ORDER BY carrier, ship_month;
Result (top 7 of 120 rows)
| carrier | ship_month | delivery_count | avg_days | prev_month_days | mom_change |
|---|---|---|---|---|---|
| DHL | 2024-01 | 43 | 2.60 | NULL | NULL |
| DHL | 2024-02 | 52 | 2.69 | 2.60 | 0.09 |
| DHL | 2024-03 | 83 | 2.52 | 2.69 | -0.17 |
| DHL | 2024-04 | 84 | 2.50 | 2.52 | -0.02 |
| DHL | 2024-05 | 56 | 2.43 | 2.50 | -0.07 |
| DHL | 2024-06 | 57 | 2.58 | 2.43 | 0.15 |
| DHL | 2024-07 | 64 | 2.58 | 2.58 | 0.0 |
30. 3+ Consecutive Days of Increasing Revenue
Find periods of 3+ consecutive days with increasing daily revenue. Show start date, end date, and streak length.
Hint 1: - Use LAG to flag daily revenue increases - Use flag breaks as group boundaries (island pattern) - Filter groups with streak >= 3
Answer
WITH daily AS (
SELECT
SUBSTR(ordered_at, 1, 10) AS order_date,
ROUND(SUM(total_amount), 2) AS revenue
FROM orders
WHERE status NOT IN ('cancelled', 'returned', 'return_requested')
AND ordered_at >= '2024-01-01'
GROUP BY SUBSTR(ordered_at, 1, 10)
),
with_flag AS (
SELECT
order_date,
revenue,
LAG(revenue) OVER (ORDER BY order_date) AS prev_revenue,
CASE
WHEN revenue > LAG(revenue) OVER (ORDER BY order_date) THEN 1
ELSE 0
END AS is_increase
FROM daily
),
with_group AS (
SELECT
order_date,
revenue,
is_increase,
SUM(CASE WHEN is_increase = 0 THEN 1 ELSE 0 END) OVER (
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS grp
FROM with_flag
)
SELECT
MIN(order_date) AS start_date,
MAX(order_date) AS end_date,
COUNT(*) AS streak_days
FROM with_group
WHERE is_increase = 1
GROUP BY grp
HAVING COUNT(*) >= 3
ORDER BY start_date;
WITH daily AS (
SELECT
SUBSTR(ordered_at, 1, 10) AS order_date,
ROUND(SUM(total_amount), 2) AS revenue
FROM orders
WHERE status NOT IN ('cancelled', 'returned', 'return_requested')
AND ordered_at >= '2024-01-01'
GROUP BY SUBSTR(ordered_at, 1, 10)
),
with_flag AS (
SELECT
order_date,
revenue,
LAG(revenue) OVER (ORDER BY order_date) AS prev_revenue,
CASE
WHEN revenue > LAG(revenue) OVER (ORDER BY order_date) THEN 1
ELSE 0
END AS is_increase
FROM daily
),
with_group AS (
SELECT
order_date,
revenue,
is_increase,
SUM(CASE WHEN is_increase = 0 THEN 1 ELSE 0 END) OVER (
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS grp
FROM with_flag
)
SELECT
MIN(order_date) AS start_date,
MAX(order_date) AS end_date,
COUNT(*) AS streak_days
FROM with_group
WHERE is_increase = 1
GROUP BY grp
HAVING COUNT(*) >= 3
ORDER BY start_date;
WITH daily AS (
SELECT
SUBSTRING(ordered_at, 1, 10) AS order_date,
ROUND(SUM(total_amount), 2) AS revenue
FROM orders
WHERE status NOT IN ('cancelled', 'returned', 'return_requested')
AND ordered_at >= '2024-01-01'
GROUP BY SUBSTRING(ordered_at, 1, 10)
),
with_flag AS (
SELECT
order_date,
revenue,
LAG(revenue) OVER (ORDER BY order_date) AS prev_revenue,
CASE
WHEN revenue > LAG(revenue) OVER (ORDER BY order_date) THEN 1
ELSE 0
END AS is_increase
FROM daily
),
with_group AS (
SELECT
order_date,
revenue,
is_increase,
SUM(CASE WHEN is_increase = 0 THEN 1 ELSE 0 END) OVER (
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS grp
FROM with_flag
)
SELECT
MIN(order_date) AS start_date,
MAX(order_date) AS end_date,
COUNT(*) AS streak_days
FROM with_group
WHERE is_increase = 1
GROUP BY grp
HAVING COUNT(*) >= 3
ORDER BY start_date;
Result (top 7 of 25 rows)
| start_date | end_date | streak_days |
|---|---|---|
| 2024-01-04 | 2024-01-07 | 4 |
| 2024-01-17 | 2024-01-21 | 5 |
| 2024-01-26 | 2024-01-28 | 3 |
| 2024-03-04 | 2024-03-06 | 3 |
| 2024-03-27 | 2024-03-29 | 3 |
| 2024-05-14 | 2024-05-16 | 3 |
| 2024-06-20 | 2024-06-23 | 4 |
31. Customers with Consecutive Monthly Orders
Find customers who ordered every month for 5+ consecutive months. Show customer name, streak start month, end month, and duration.
Hint 1: - Get monthly order flag per customer first - Use "month_number - ROW_NUMBER" pattern to detect consecutive months - Rows with same difference form a consecutive group
Answer
WITH customer_months AS (
SELECT DISTINCT
customer_id,
SUBSTR(ordered_at, 1, 7) AS order_month,
CAST(SUBSTR(ordered_at, 1, 4) AS INTEGER) * 12
+ CAST(SUBSTR(ordered_at, 6, 2) AS INTEGER) AS month_num
FROM orders
WHERE status NOT IN ('cancelled', 'returned', 'return_requested')
),
with_rn AS (
SELECT
customer_id,
order_month,
month_num,
month_num - ROW_NUMBER() OVER (
PARTITION BY customer_id ORDER BY month_num
) AS grp
FROM customer_months
),
streaks AS (
SELECT
customer_id,
MIN(order_month) AS start_month,
MAX(order_month) AS end_month,
COUNT(*) AS consecutive_months
FROM with_rn
GROUP BY customer_id, grp
HAVING COUNT(*) >= 5
)
SELECT
c.name AS customer_name,
c.grade,
s.start_month,
s.end_month,
s.consecutive_months
FROM streaks AS s
INNER JOIN customers AS c ON s.customer_id = c.id
ORDER BY s.consecutive_months DESC, c.name
LIMIT 20;
WITH customer_months AS (
SELECT DISTINCT
customer_id,
SUBSTR(ordered_at, 1, 7) AS order_month,
CAST(SUBSTR(ordered_at, 1, 4) AS NUMBER) * 12
+ CAST(SUBSTR(ordered_at, 6, 2) AS NUMBER) AS month_num
FROM orders
WHERE status NOT IN ('cancelled', 'returned', 'return_requested')
),
with_rn AS (
SELECT
customer_id,
order_month,
month_num,
month_num - ROW_NUMBER() OVER (
PARTITION BY customer_id ORDER BY month_num
) AS grp
FROM customer_months
),
streaks AS (
SELECT
customer_id,
MIN(order_month) AS start_month,
MAX(order_month) AS end_month,
COUNT(*) AS consecutive_months
FROM with_rn
GROUP BY customer_id, grp
HAVING COUNT(*) >= 5
)
SELECT
c.name AS customer_name,
c.grade,
s.start_month,
s.end_month,
s.consecutive_months
FROM streaks s
INNER JOIN customers c ON s.customer_id = c.id
ORDER BY s.consecutive_months DESC, c.name
FETCH FIRST 20 ROWS ONLY;
WITH customer_months AS (
SELECT DISTINCT
customer_id,
SUBSTRING(ordered_at, 1, 7) AS order_month,
CAST(SUBSTRING(ordered_at, 1, 4) AS INT) * 12
+ CAST(SUBSTRING(ordered_at, 6, 2) AS INT) AS month_num
FROM orders
WHERE status NOT IN ('cancelled', 'returned', 'return_requested')
),
with_rn AS (
SELECT
customer_id,
order_month,
month_num,
month_num - ROW_NUMBER() OVER (
PARTITION BY customer_id ORDER BY month_num
) AS grp
FROM customer_months
),
streaks AS (
SELECT
customer_id,
MIN(order_month) AS start_month,
MAX(order_month) AS end_month,
COUNT(*) AS consecutive_months
FROM with_rn
GROUP BY customer_id, grp
HAVING COUNT(*) >= 5
)
SELECT TOP 20
c.name AS customer_name,
c.grade,
s.start_month,
s.end_month,
s.consecutive_months
FROM streaks AS s
INNER JOIN customers AS c ON s.customer_id = c.id
ORDER BY s.consecutive_months DESC, c.name;
Result (top 7 of 20 rows)
| customer_name | grade | start_month | end_month | consecutive_months |
|---|---|---|---|---|
| Jason Rivera | VIP | 2016-07 | 2022-05 | 71 |
| Ronald Arellano | VIP | 2018-10 | 2022-07 | 46 |
| Allen Snyder | VIP | 2017-01 | 2020-09 | 45 |
| James Banks | VIP | 2017-05 | 2020-07 | 39 |
| Darren Wang | VIP | 2020-10 | 2023-11 | 38 |
| Courtney Huff | VIP | 2018-04 | 2021-03 | 36 |
| Sandra Callahan | GOLD | 2017-11 | 2020-09 | 35 |
32. Session Definition (30-min Gap)
Group product_views into sessions (30+ minute gap = new session for same customer). Calculate sessions per customer and average views per session.
Hint 1: - Use LAG(viewed_at) for previous view time
- 30 min = (JULIANDAY(current) - JULIANDAY(prev)) * 24 * 60 > 30
- SUM(is_new_session) OVER (...) to assign session numbers
Answer
WITH view_gaps AS (
SELECT
customer_id,
viewed_at,
CASE
WHEN LAG(viewed_at) OVER (
PARTITION BY customer_id ORDER BY viewed_at
) IS NULL THEN 1
WHEN (JULIANDAY(viewed_at) - JULIANDAY(LAG(viewed_at) OVER (
PARTITION BY customer_id ORDER BY viewed_at
))) * 24 * 60 > 30 THEN 1
ELSE 0
END AS is_new_session
FROM product_views
WHERE customer_id <= 500
),
with_session AS (
SELECT
customer_id,
viewed_at,
SUM(is_new_session) OVER (
PARTITION BY customer_id ORDER BY viewed_at
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS session_id
FROM view_gaps
),
session_stats AS (
SELECT
customer_id,
session_id,
COUNT(*) AS views_in_session
FROM with_session
GROUP BY customer_id, session_id
)
SELECT
COUNT(DISTINCT customer_id) AS total_customers,
COUNT(*) AS total_sessions,
ROUND(1.0 * COUNT(*) / COUNT(DISTINCT customer_id), 1) AS avg_sessions_per_customer,
ROUND(AVG(views_in_session), 1) AS avg_views_per_session
FROM session_stats;
WITH view_gaps AS (
SELECT
customer_id,
viewed_at,
CASE
WHEN LAG(viewed_at) OVER (
PARTITION BY customer_id ORDER BY viewed_at
) IS NULL THEN 1
WHEN (CAST(viewed_at AS DATE) - CAST(LAG(viewed_at) OVER (
PARTITION BY customer_id ORDER BY viewed_at
) AS DATE)) * 24 * 60 > 30 THEN 1
ELSE 0
END AS is_new_session
FROM product_views
WHERE customer_id <= 500
),
with_session AS (
SELECT
customer_id,
viewed_at,
SUM(is_new_session) OVER (
PARTITION BY customer_id ORDER BY viewed_at
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS session_id
FROM view_gaps
),
session_stats AS (
SELECT
customer_id,
session_id,
COUNT(*) AS views_in_session
FROM with_session
GROUP BY customer_id, session_id
)
SELECT
COUNT(DISTINCT customer_id) AS total_customers,
COUNT(*) AS total_sessions,
ROUND(1.0 * COUNT(*) / COUNT(DISTINCT customer_id), 1) AS avg_sessions_per_customer,
ROUND(AVG(views_in_session), 1) AS avg_views_per_session
FROM session_stats;
WITH view_gaps AS (
SELECT
customer_id,
viewed_at,
CASE
WHEN LAG(viewed_at) OVER (
PARTITION BY customer_id ORDER BY viewed_at
) IS NULL THEN 1
WHEN DATEDIFF(MINUTE, LAG(viewed_at) OVER (
PARTITION BY customer_id ORDER BY viewed_at
), viewed_at) > 30 THEN 1
ELSE 0
END AS is_new_session
FROM product_views
WHERE customer_id <= 500
),
with_session AS (
SELECT
customer_id,
viewed_at,
SUM(is_new_session) OVER (
PARTITION BY customer_id ORDER BY viewed_at
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS session_id
FROM view_gaps
),
session_stats AS (
SELECT
customer_id,
session_id,
COUNT(*) AS views_in_session
FROM with_session
GROUP BY customer_id, session_id
)
SELECT
COUNT(DISTINCT customer_id) AS total_customers,
COUNT(*) AS total_sessions,
ROUND(1.0 * COUNT(*) / COUNT(DISTINCT customer_id), 1) AS avg_sessions_per_customer,
ROUND(AVG(views_in_session), 1) AS avg_views_per_session
FROM session_stats;
Result (1 rows)
| total_customers | total_sessions | avg_sessions_per_customer | avg_views_per_session |
|---|---|---|---|
| 267 | 65,248 | 244.40 | 1.10 |
33. Session Redefinition (10-min Gap)
Same as c11-32 but with a 10-minute gap threshold. Compare session counts with the 30-minute version.
Hint 1: - Just change 30 to 10 - Sessions will split more frequently
Answer
WITH view_gaps AS (
SELECT
customer_id,
viewed_at,
CASE
WHEN LAG(viewed_at) OVER (
PARTITION BY customer_id ORDER BY viewed_at
) IS NULL THEN 1
WHEN (JULIANDAY(viewed_at) - JULIANDAY(LAG(viewed_at) OVER (
PARTITION BY customer_id ORDER BY viewed_at
))) * 24 * 60 > 10 THEN 1
ELSE 0
END AS is_new_session
FROM product_views
WHERE customer_id <= 500
),
with_session AS (
SELECT
customer_id,
viewed_at,
SUM(is_new_session) OVER (
PARTITION BY customer_id ORDER BY viewed_at
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS session_id
FROM view_gaps
),
session_stats AS (
SELECT
customer_id,
session_id,
COUNT(*) AS views_in_session
FROM with_session
GROUP BY customer_id, session_id
)
SELECT
COUNT(DISTINCT customer_id) AS total_customers,
COUNT(*) AS total_sessions,
ROUND(1.0 * COUNT(*) / COUNT(DISTINCT customer_id), 1) AS avg_sessions_per_customer,
ROUND(AVG(views_in_session), 1) AS avg_views_per_session
FROM session_stats;
WITH view_gaps AS (
SELECT
customer_id,
viewed_at,
CASE
WHEN LAG(viewed_at) OVER (
PARTITION BY customer_id ORDER BY viewed_at
) IS NULL THEN 1
WHEN (CAST(viewed_at AS DATE) - CAST(LAG(viewed_at) OVER (
PARTITION BY customer_id ORDER BY viewed_at
) AS DATE)) * 24 * 60 > 10 THEN 1
ELSE 0
END AS is_new_session
FROM product_views
WHERE customer_id <= 500
),
with_session AS (
SELECT
customer_id,
viewed_at,
SUM(is_new_session) OVER (
PARTITION BY customer_id ORDER BY viewed_at
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS session_id
FROM view_gaps
),
session_stats AS (
SELECT
customer_id,
session_id,
COUNT(*) AS views_in_session
FROM with_session
GROUP BY customer_id, session_id
)
SELECT
COUNT(DISTINCT customer_id) AS total_customers,
COUNT(*) AS total_sessions,
ROUND(1.0 * COUNT(*) / COUNT(DISTINCT customer_id), 1) AS avg_sessions_per_customer,
ROUND(AVG(views_in_session), 1) AS avg_views_per_session
FROM session_stats;
WITH view_gaps AS (
SELECT
customer_id,
viewed_at,
CASE
WHEN LAG(viewed_at) OVER (
PARTITION BY customer_id ORDER BY viewed_at
) IS NULL THEN 1
WHEN DATEDIFF(MINUTE, LAG(viewed_at) OVER (
PARTITION BY customer_id ORDER BY viewed_at
), viewed_at) > 10 THEN 1
ELSE 0
END AS is_new_session
FROM product_views
WHERE customer_id <= 500
),
with_session AS (
SELECT
customer_id,
viewed_at,
SUM(is_new_session) OVER (
PARTITION BY customer_id ORDER BY viewed_at
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS session_id
FROM view_gaps
),
session_stats AS (
SELECT
customer_id,
session_id,
COUNT(*) AS views_in_session
FROM with_session
GROUP BY customer_id, session_id
)
SELECT
COUNT(DISTINCT customer_id) AS total_customers,
COUNT(*) AS total_sessions,
ROUND(1.0 * COUNT(*) / COUNT(DISTINCT customer_id), 1) AS avg_sessions_per_customer,
ROUND(AVG(views_in_session), 1) AS avg_views_per_session
FROM session_stats;
Result (1 rows)
| total_customers | total_sessions | avg_sessions_per_customer | avg_views_per_session |
|---|---|---|---|
| 267 | 67,538 | 253.00 | 1.00 |
34. Median Order Amount by Customer Grade
Calculate the median order amount per customer grade. SQLite lacks MEDIAN, so implement it using NTILE or ROW_NUMBER.
Hint 1: - ROW_NUMBER() OVER (PARTITION BY grade ORDER BY total_amount)
- Median is at half of total count
- COUNT(*) OVER (PARTITION BY grade) for total count
Answer
WITH grade_orders AS (
SELECT
c.grade,
o.total_amount,
ROW_NUMBER() OVER (PARTITION BY c.grade ORDER BY o.total_amount) AS rn,
COUNT(*) OVER (PARTITION BY c.grade) AS cnt
FROM orders AS o
INNER JOIN customers AS c ON o.customer_id = c.id
WHERE o.status NOT IN ('cancelled', 'returned', 'return_requested')
)
SELECT
grade,
ROUND(AVG(total_amount), 2) AS median_amount
FROM grade_orders
WHERE rn IN (cnt / 2, cnt / 2 + 1)
GROUP BY grade
ORDER BY
CASE grade
WHEN 'VIP' THEN 1 WHEN 'GOLD' THEN 2
WHEN 'SILVER' THEN 3 WHEN 'BRONZE' THEN 4
END;
Result (4 rows)
| grade | median_amount |
|---|---|
| VIP | 481,750.00 |
| GOLD | 456,200.00 |
| SILVER | 423,000.00 |
| BRONZE | 317,400.00 |
35. Median Delivery Days by Carrier
Calculate the median delivery duration (in days) by carrier.
Hint 1: - Calculate delivery days: JULIANDAY(delivered_at) - JULIANDAY(shipped_at)
- Use ROW_NUMBER() and pick the middle value
Answer
WITH delivery_days AS (
SELECT
carrier,
ROUND(JULIANDAY(delivered_at) - JULIANDAY(shipped_at), 1) AS days,
ROW_NUMBER() OVER (PARTITION BY carrier ORDER BY JULIANDAY(delivered_at) - JULIANDAY(shipped_at)) AS rn,
COUNT(*) OVER (PARTITION BY carrier) AS cnt
FROM shipping
WHERE delivered_at IS NOT NULL
AND shipped_at IS NOT NULL
)
SELECT
carrier,
ROUND(AVG(days), 2) AS median_days
FROM delivery_days
WHERE rn IN (cnt / 2, cnt / 2 + 1)
GROUP BY carrier
ORDER BY median_days;
WITH delivery_days AS (
SELECT
carrier,
ROUND(CAST(delivered_at AS DATE) - CAST(shipped_at AS DATE), 1) AS days,
ROW_NUMBER() OVER (PARTITION BY carrier ORDER BY CAST(delivered_at AS DATE) - CAST(shipped_at AS DATE)) AS rn,
COUNT(*) OVER (PARTITION BY carrier) AS cnt
FROM shipping
WHERE delivered_at IS NOT NULL
AND shipped_at IS NOT NULL
)
SELECT
carrier,
ROUND(AVG(days), 2) AS median_days
FROM delivery_days
WHERE rn IN (TRUNC(cnt / 2), TRUNC(cnt / 2) + 1)
GROUP BY carrier
ORDER BY median_days;
WITH delivery_days AS (
SELECT
carrier,
ROUND(CAST(DATEDIFF(DAY, shipped_at, delivered_at) AS FLOAT), 1) AS days,
ROW_NUMBER() OVER (PARTITION BY carrier ORDER BY DATEDIFF(DAY, shipped_at, delivered_at)) AS rn,
COUNT(*) OVER (PARTITION BY carrier) AS cnt
FROM shipping
WHERE delivered_at IS NOT NULL
AND shipped_at IS NOT NULL
)
SELECT
carrier,
ROUND(AVG(days), 2) AS median_days
FROM delivery_days
WHERE rn IN (cnt / 2, cnt / 2 + 1)
GROUP BY carrier
ORDER BY median_days;
Result (5 rows)
| carrier | median_days |
|---|---|
| USPS | 2.00 |
| DHL | 3.00 |
| FedEx | 3.00 |
| OnTrac | 3.00 |
| UPS | 3.00 |
36. Funnel Drop-off by Device Type
By device_type (desktop/mobile/tablet), calculate conversion and drop-off rates for view -> cart -> purchase funnel.
Hint 1: - Group by product_views.device_type
- Count distinct customers at each stage by device
- Drop-off rate = 1 - conversion rate
Answer
WITH view_step AS (
SELECT
device_type,
COUNT(DISTINCT customer_id) AS viewers
FROM product_views
GROUP BY device_type
),
cart_step AS (
SELECT
pv.device_type,
COUNT(DISTINCT c.customer_id) AS carters
FROM product_views AS pv
INNER JOIN carts AS c ON pv.customer_id = c.customer_id
INNER JOIN cart_items AS ci ON c.id = ci.cart_id AND pv.product_id = ci.product_id
GROUP BY pv.device_type
),
purchase_step AS (
SELECT
pv.device_type,
COUNT(DISTINCT o.customer_id) AS buyers
FROM product_views AS pv
INNER JOIN orders AS o ON pv.customer_id = o.customer_id
INNER JOIN order_items AS oi ON o.id = oi.order_id AND pv.product_id = oi.product_id
WHERE o.status NOT IN ('cancelled', 'returned', 'return_requested')
GROUP BY pv.device_type
)
SELECT
vs.device_type,
vs.viewers,
COALESCE(cs.carters, 0) AS carters,
ROUND(100.0 * COALESCE(cs.carters, 0) / vs.viewers, 2) AS view_to_cart_pct,
COALESCE(ps.buyers, 0) AS buyers,
ROUND(100.0 * COALESCE(ps.buyers, 0) / NULLIF(COALESCE(cs.carters, 0), 0), 2) AS cart_to_buy_pct,
ROUND(100.0 * COALESCE(ps.buyers, 0) / vs.viewers, 2) AS view_to_buy_pct
FROM view_step AS vs
LEFT JOIN cart_step AS cs ON vs.device_type = cs.device_type
LEFT JOIN purchase_step AS ps ON vs.device_type = ps.device_type
ORDER BY vs.device_type;
Result (3 rows)
| device_type | viewers | carters | view_to_cart_pct | buyers | cart_to_buy_pct | view_to_buy_pct |
|---|---|---|---|---|---|---|
| desktop | 3658 | 486 | 13.29 | 2710 | 557.61 | 74.08 |
| mobile | 3660 | 513 | 14.02 | 2707 | 527.68 | 73.96 |
| tablet | 3385 | 201 | 5.94 | 2301 | 1,144.78 | 67.98 |
37. Channel Attribution Analysis
Calculate views and purchase conversions by referrer_source. For multi-channel customers, give credit to the last-touch channel before purchase.
Hint 1: - Get last referrer_source per customer (most recent product_view) - Credit that channel for conversion
Answer
WITH last_touch AS (
SELECT
pv.customer_id,
pv.referrer_source,
ROW_NUMBER() OVER (
PARTITION BY pv.customer_id
ORDER BY pv.viewed_at DESC
) AS rn
FROM product_views AS pv
INNER JOIN orders AS o ON pv.customer_id = o.customer_id
WHERE o.status NOT IN ('cancelled', 'returned', 'return_requested')
),
channel_views AS (
SELECT
referrer_source,
COUNT(*) AS total_views,
COUNT(DISTINCT customer_id) AS unique_viewers
FROM product_views
GROUP BY referrer_source
),
channel_conversions AS (
SELECT
referrer_source,
COUNT(DISTINCT customer_id) AS conversions
FROM last_touch
WHERE rn = 1
GROUP BY referrer_source
)
SELECT
cv.referrer_source,
cv.total_views,
cv.unique_viewers,
COALESCE(cc.conversions, 0) AS last_touch_conversions,
ROUND(100.0 * COALESCE(cc.conversions, 0) / cv.unique_viewers, 2) AS conversion_rate_pct
FROM channel_views AS cv
LEFT JOIN channel_conversions AS cc ON cv.referrer_source = cc.referrer_source
ORDER BY last_touch_conversions DESC;
Result (6 rows)
| referrer_source | total_views | unique_viewers | last_touch_conversions | conversion_rate_pct |
|---|---|---|---|---|
| search | 114,520 | 3652 | 995 | 27.25 |
| direct | 65,716 | 3603 | 541 | 15.02 |
| ad | 49,030 | 3549 | 411 | 11.58 |
| recommendation | 49,429 | 3544 | 405 | 11.43 |
| social | 32,707 | 3402 | 291 | 8.55 |
| 16,469 | 2929 | 150 | 5.12 |
38. Active Day Island Detection
Find "islands" of consecutive active days per customer in product_views. e.g., Mon/Tue/Wed views -> Thu gap -> Fri/Sat views = 2 islands (3 days, 2 days).
Hint 1: - Extract distinct view dates per customer
- Use date - ROW_NUMBER pattern to group consecutive dates
- MIN/MAX/COUNT per group
Answer
WITH active_days AS (
SELECT DISTINCT
customer_id,
SUBSTR(viewed_at, 1, 10) AS view_date
FROM product_views
WHERE customer_id <= 200
),
with_rn AS (
SELECT
customer_id,
view_date,
JULIANDAY(view_date) - ROW_NUMBER() OVER (
PARTITION BY customer_id ORDER BY view_date
) AS grp
FROM active_days
),
islands AS (
SELECT
customer_id,
MIN(view_date) AS island_start,
MAX(view_date) AS island_end,
COUNT(*) AS island_days
FROM with_rn
GROUP BY customer_id, grp
HAVING COUNT(*) >= 3
)
SELECT
c.name AS customer_name,
i.island_start,
i.island_end,
i.island_days
FROM islands AS i
INNER JOIN customers AS c ON i.customer_id = c.id
ORDER BY i.island_days DESC, c.name
LIMIT 20;
WITH active_days AS (
SELECT DISTINCT
customer_id,
SUBSTR(viewed_at, 1, 10) AS view_date
FROM product_views
WHERE customer_id <= 200
),
with_rn AS (
SELECT
customer_id,
view_date,
CAST(view_date AS DATE) - ROW_NUMBER() OVER (
PARTITION BY customer_id ORDER BY view_date
) AS grp
FROM active_days
),
islands AS (
SELECT
customer_id,
MIN(view_date) AS island_start,
MAX(view_date) AS island_end,
COUNT(*) AS island_days
FROM with_rn
GROUP BY customer_id, grp
HAVING COUNT(*) >= 3
)
SELECT
c.name AS customer_name,
i.island_start,
i.island_end,
i.island_days
FROM islands i
INNER JOIN customers c ON i.customer_id = c.id
ORDER BY i.island_days DESC, c.name
FETCH FIRST 20 ROWS ONLY;
WITH active_days AS (
SELECT DISTINCT
customer_id,
SUBSTRING(viewed_at, 1, 10) AS view_date
FROM product_views
WHERE customer_id <= 200
),
with_rn AS (
SELECT
customer_id,
view_date,
DATEADD(DAY, -ROW_NUMBER() OVER (
PARTITION BY customer_id ORDER BY view_date
), CAST(view_date AS DATE)) AS grp
FROM active_days
),
islands AS (
SELECT
customer_id,
MIN(view_date) AS island_start,
MAX(view_date) AS island_end,
COUNT(*) AS island_days
FROM with_rn
GROUP BY customer_id, grp
HAVING COUNT(*) >= 3
)
SELECT TOP 20
c.name AS customer_name,
i.island_start,
i.island_end,
i.island_days
FROM islands AS i
INNER JOIN customers AS c ON i.customer_id = c.id
ORDER BY i.island_days DESC, c.name;
Result (top 7 of 20 rows)
| customer_name | island_start | island_end | island_days |
|---|---|---|---|
| Gabriel Walters | 2016-01-09 | 2016-02-17 | 40 |
| Jason Rivera | 2020-02-28 | 2020-03-25 | 27 |
| Adam Moore | 2016-03-05 | 2016-03-29 | 25 |
| Gabriel Walters | 2016-04-18 | 2016-05-12 | 25 |
| Jason Rivera | 2017-10-13 | 2017-11-06 | 25 |
| Terri Jones | 2018-02-19 | 2018-03-15 | 25 |
| Brenda Garcia | 2021-08-30 | 2021-09-21 | 23 |
39. Complete Monthly Dashboard
Build a complete monthly dashboard for 2024 in one query: revenue, orders, new customers, active customers, avg order value, month-over-month revenue change, and top product by revenue.
Hint 1: - Prepare order/customer/product stats in separate CTEs - JOIN all by month - LAG for month-over-month change - ROW_NUMBER for top product per month
Answer
WITH monthly_orders AS (
SELECT
SUBSTR(ordered_at, 1, 7) AS year_month,
COUNT(*) AS order_count,
ROUND(SUM(total_amount), 0) AS revenue,
ROUND(AVG(total_amount), 0) AS avg_order_value,
COUNT(DISTINCT customer_id) AS active_customers
FROM orders
WHERE ordered_at LIKE '2024%'
AND status NOT IN ('cancelled', 'returned', 'return_requested')
GROUP BY SUBSTR(ordered_at, 1, 7)
),
new_customers AS (
SELECT
SUBSTR(created_at, 1, 7) AS year_month,
COUNT(*) AS new_customer_count
FROM customers
WHERE created_at LIKE '2024%'
GROUP BY SUBSTR(created_at, 1, 7)
),
top_products AS (
SELECT
SUBSTR(o.ordered_at, 1, 7) AS year_month,
p.name AS product_name,
ROUND(SUM(oi.quantity * oi.unit_price), 0) AS product_revenue,
ROW_NUMBER() OVER (
PARTITION BY SUBSTR(o.ordered_at, 1, 7)
ORDER BY SUM(oi.quantity * oi.unit_price) DESC
) AS rn
FROM order_items AS oi
INNER JOIN orders AS o ON oi.order_id = o.id
INNER JOIN products AS p ON oi.product_id = p.id
WHERE o.ordered_at LIKE '2024%'
AND o.status NOT IN ('cancelled', 'returned', 'return_requested')
GROUP BY SUBSTR(o.ordered_at, 1, 7), p.id, p.name
),
with_growth AS (
SELECT
mo.year_month,
mo.revenue,
mo.order_count,
COALESCE(nc.new_customer_count, 0) AS new_customers,
mo.active_customers,
mo.avg_order_value,
LAG(mo.revenue) OVER (ORDER BY mo.year_month) AS prev_revenue,
ROUND(100.0 * (mo.revenue - LAG(mo.revenue) OVER (ORDER BY mo.year_month))
/ NULLIF(LAG(mo.revenue) OVER (ORDER BY mo.year_month), 0), 1) AS mom_growth_pct,
tp.product_name AS top_product
FROM monthly_orders AS mo
LEFT JOIN new_customers AS nc ON mo.year_month = nc.year_month
LEFT JOIN top_products AS tp ON mo.year_month = tp.year_month AND tp.rn = 1
)
SELECT
year_month,
revenue,
order_count,
new_customers,
active_customers,
avg_order_value,
mom_growth_pct,
top_product
FROM with_growth
ORDER BY year_month;
Result (top 7 of 12 rows)
| year_month | revenue | order_count | new_customers | active_customers | avg_order_value | mom_growth_pct | top_product |
|---|---|---|---|---|---|---|---|
| 2024-01 | 288,908,320.00 | 314 | 52 | 269 | 920,090.00 | NULL | Razer Blade 18 Black |
| 2024-02 | 403,127,749.00 | 416 | 48 | 335 | 969,057.00 | 39.50 | Razer Blade 16 Silver |
| 2024-03 | 519,844,502.00 | 555 | 71 | 421 | 936,657.00 | 29.00 | Razer Blade 16 Silver |
| 2024-04 | 451,877,581.00 | 466 | 53 | 353 | 969,694.00 | -13.10 | ASUS ROG Swift PG32UCDM Silver |
| 2024-05 | 425,264,478.00 | 385 | 43 | 315 | 1,104,583.00 | -5.90 | Razer Blade 18 Black |
| 2024-06 | 362,715,211.00 | 389 | 68 | 324 | 932,430.00 | -14.70 | ASUS ROG Strix Scar 16 |
| 2024-07 | 343,929,897.00 | 381 | 62 | 311 | 902,703.00 | -5.20 | Razer Blade 18 Black |
40. JSON Specs Query
Extract CPU information from the products.specs JSON column. Show product name and CPU for products where specs is not NULL.
Hint 1: - SQLite: use JSON_EXTRACT(specs, '$.cpu')
- PostgreSQL: use specs->>'cpu'
- Filter WHERE specs IS NOT NULL
Answer
Result (top 7 of 20 rows)
| name | brand | price | cpu | ram | storage |
|---|---|---|---|---|---|
| MacBook Air 15 M3 Silver | Apple | 5,481,100.00 | Intel Core i9-13900H | 8GB | 256GB |
| Razer Blade 18 Black | Razer | 4,353,100.00 | Intel Core i7-13700H | 8GB | 1024GB |
| Razer Blade 16 Silver | Razer | 3,702,900.00 | AMD Ryzen 9 7945HX | 32GB | 512GB |
| ASUS ROG Strix G16CH White | ASUS | 3,671,500.00 | AMD Ryzen 5 7600X | 16GB | 2048GB |
| ASUS ROG Zephyrus G16 | ASUS | 3,429,900.00 | Apple M3 | 16GB | 512GB |
| ASUS ROG Strix GT35 | ASUS | 3,296,800.00 | Intel Core i7-13700K | 64GB | 2048GB |
| Razer Blade 18 Black | Razer | 2,987,500.00 | Apple M3 | 8GB | 256GB |