Lesson 5: GROUP BY and HAVING
In Lesson 4, we summarized entire datasets with COUNT, SUM, AVG, etc. But what if you want to aggregate by group, like "customer count by grade" or "revenue by month"? Use GROUP BY.
Already familiar?
If you already know GROUP BY, multi-column grouping, and HAVING, skip ahead to Lesson 6: Handling NULL.
Concept: GROUP BY groups rows together and applies aggregate functions to each group.
GROUP BY -- Single Column
-- Customer count by membership grade
SELECT
grade,
COUNT(*) AS customer_count
FROM customers
GROUP BY grade;
Result:
| grade | customer_count |
|---|---|
| BRONZE | 3859 |
| GOLD | 524 |
| SILVER | 479 |
| VIP | 368 |
The database gathers rows with the same grade value into a single bucket, then counts the rows in each bucket.
-- Order count and revenue total by order status
SELECT
status,
COUNT(*) AS order_count,
SUM(total_amount) AS total_revenue
FROM orders
GROUP BY status
ORDER BY total_revenue DESC;
Result:
| status | order_count | total_revenue |
|---|---|---|
| confirmed | 34393 | 34386590179.0 |
| cancelled | 1859 | 1943134782.0 |
| return_requested | 507 | 811487700.0 |
| returned | 493 | 659444574.0 |
| delivered | 125 | 195768235.0 |
| pending | 82 | 87230243.0 |
| shipped | 51 | 74070546.0 |
| paid | 23 | 13521976.0 |
| ... | ... | ... |
GROUP BY -- Multiple Columns
Grouping by two or more columns allows for more granular breakdowns.
-- Customer count by grade and gender
SELECT
grade,
gender,
COUNT(*) AS cnt
FROM customers
WHERE gender IS NOT NULL
GROUP BY grade, gender
ORDER BY grade, gender;
Result:
| grade | gender | cnt |
|---|---|---|
| BRONZE | F | 1302 |
| BRONZE | M | 2128 |
| GOLD | F | 140 |
| GOLD | M | 343 |
| SILVER | F | 141 |
| SILVER | M | 293 |
| VIP | F | 86 |
| VIP | M | 268 |
| ... | ... | ... |
Monthly Order Aggregation
By extracting the year-month from a date column, you can group by month.
Result:
| year_month | order_count | monthly_revenue |
|---|---|---|
| 2024-01 | 312 | 178432.50 |
| 2024-02 | 289 | 162890.20 |
| 2024-03 | 405 | 238741.90 |
| ... |
HAVING
HAVING filters after grouping and uses aggregate values as conditions. Think of it as WHERE for groups.
-- Only show grades with more than 500 customers
SELECT
grade,
COUNT(*) AS customer_count
FROM customers
GROUP BY grade
HAVING COUNT(*) > 500;
Result:
| grade | customer_count |
|---|---|
| BRONZE | 3859 |
| GOLD | 524 |
-- Categories with 10+ active products and average price over $100
SELECT
category_id,
COUNT(*) AS product_count,
AVG(price) AS avg_price
FROM products
WHERE is_active = 1
GROUP BY category_id
HAVING COUNT(*) >= 10
AND AVG(price) > 100
ORDER BY avg_price DESC;
Result:
| category_id | product_count | avg_price |
|---|---|---|
| 12 | 10 | 1123150.0 |
| 18 | 10 | 527080.0 |
| 30 | 11 | 234645.45454545456 |
| 31 | 10 | 159930.0 |
WHERE vs. HAVING
| Clause | Filters | Applied when |
|---|---|---|
WHERE |
Individual rows | Before grouping |
HAVING |
Groups | After grouping |
-- WHERE filters rows first, then HAVING filters groups
SELECT
grade,
AVG(point_balance) AS avg_points
FROM customers
WHERE is_active = 1 -- Exclude inactive customers (row level)
GROUP BY grade
HAVING AVG(point_balance) > 500; -- Only grades with average points over 500
Summary
| Syntax | Description | Example |
|---|---|---|
GROUP BY column |
Group by a single column | GROUP BY grade |
GROUP BY col1, col2 |
Subdivide by multiple columns | GROUP BY grade, gender |
HAVING condition |
Filter after grouping (based on aggregate values) | HAVING COUNT(*) > 500 |
WHERE vs HAVING |
WHERE filters rows (before grouping), HAVING filters groups (after grouping) |
Lesson Review Problems
These are simple problems to immediately check the concepts learned in this lesson. For comprehensive practice combining multiple concepts, see the Practice Problems section.
Practice Problems
Problem 1
Aggregate the order count by status. Show only statuses with more than 1,000 orders, sorted by count descending.
Answer
SELECT
status,
COUNT(*) AS order_count
FROM orders
GROUP BY status
HAVING COUNT(*) > 1000
ORDER BY order_count DESC;
Result (example):
| status | order_count |
|---|---|
| confirmed | 34393 |
| cancelled | 1859 |
Problem 2
From the payments table, find the total collected amount and transaction count by payment method (method). Sort by total amount descending.
Answer
SELECT
method,
COUNT(*) AS transaction_count,
SUM(amount) AS total_collected
FROM payments
WHERE status = 'completed'
GROUP BY method
ORDER BY total_collected DESC;
Result (example):
| method | transaction_count | total_collected |
|---|---|---|
| card | 15556 | 15537036997.0 |
| kakao_pay | 6886 | 6781114303.0 |
| naver_pay | 5270 | 5420480093.0 |
| bank_transfer | 3429 | 3456454657.0 |
| point | 1770 | 1780334619.0 |
| virtual_account | 1705 | 1706777095.0 |
| ... | ... | ... |
Problem 3
From the customers table, find the average point_balance by grade. Sort by average points descending.
Answer
SELECT
grade,
AVG(point_balance) AS avg_points
FROM customers
GROUP BY grade
ORDER BY avg_points DESC;
Result (example):
| grade | avg_points |
|---|---|
| VIP | 407014.6902173913 |
| GOLD | 147710.68702290076 |
| SILVER | 95042.32985386222 |
| BRONZE | 16779.46022285566 |
Problem 4
Group by both grade and gender from the customers table and count customers. Include rows where gender is NULL.
Answer
SELECT
grade,
gender,
COUNT(*) AS customer_count
FROM customers
GROUP BY grade, gender
ORDER BY grade, gender;
Result (example):
| grade | gender | customer_count |
|---|---|---|
| BRONZE | (NULL) | 429 |
| BRONZE | F | 1302 |
| BRONZE | M | 2128 |
| GOLD | (NULL) | 41 |
| GOLD | F | 140 |
| GOLD | M | 343 |
| SILVER | (NULL) | 45 |
| SILVER | F | 141 |
| ... | ... | ... |
Problem 5
From the reviews table, find the review count by rating. Show only ratings with 100 or more reviews, sorted by rating.
Answer
SELECT
rating,
COUNT(*) AS review_count
FROM reviews
GROUP BY rating
HAVING COUNT(*) >= 100
ORDER BY rating;
Result (example):
| rating | review_count |
|---|---|
| 1 | 434 |
| 2 | 839 |
| 3 | 1265 |
| 4 | 2575 |
| 5 | 3433 |
| ... | ... |
Problem 6
From the orders table, considering only active orders (status NOT IN ('cancelled', 'returned')), find the count and average amount (0 decimal places) by status. Show only statuses where the average amount exceeds 300.
Answer
SELECT
status,
COUNT(*) AS order_count,
ROUND(AVG(total_amount), 0) AS avg_amount
FROM orders
WHERE status NOT IN ('cancelled', 'returned')
GROUP BY status
HAVING AVG(total_amount) > 300
ORDER BY avg_amount DESC;
Result (example):
| status | order_count | avg_amount |
|---|---|---|
| return_requested | 507 | 1600567.0 |
| delivered | 125 | 1566146.0 |
| shipped | 51 | 1452364.0 |
| pending | 82 | 1063783.0 |
| confirmed | 34393 | 999814.0 |
| paid | 23 | 587912.0 |
| preparing | 24 | 510285.0 |
| ... | ... | ... |
Problem 7
From the 2023-2024 orders data, find months where monthly revenue exceeded $500,000. Return year_month and monthly_revenue sorted by date.
Answer
SELECT
SUBSTR(ordered_at, 1, 7) AS year_month,
SUM(total_amount) AS monthly_revenue
FROM orders
WHERE ordered_at BETWEEN '2023-01-01' AND '2024-12-31 23:59:59'
AND status NOT IN ('cancelled', 'returned')
GROUP BY SUBSTR(ordered_at, 1, 7)
HAVING SUM(total_amount) > 500000
ORDER BY year_month;
Result (example):
| year_month | monthly_revenue |
|---|---|
| 2023-01 | 274226287.0 |
| 2023-02 | 333966148.0 |
| 2023-03 | 491087654.0 |
| 2023-04 | 403110649.0 |
| 2023-05 | 361101076.0 |
| 2023-06 | 288736533.0 |
| 2023-07 | 319249348.0 |
| 2023-08 | 366518636.0 |
| ... | ... |
=== "MySQL"
```sql
SELECT
DATE_FORMAT(ordered_at, '%Y-%m') AS year_month,
SUM(total_amount) AS monthly_revenue
FROM orders
WHERE ordered_at >= '2023-01-01'
AND ordered_at < '2025-01-01'
AND status NOT IN ('cancelled', 'returned')
GROUP BY DATE_FORMAT(ordered_at, '%Y-%m')
HAVING SUM(total_amount) > 500000
ORDER BY year_month;
```
=== "PostgreSQL"
```sql
SELECT
TO_CHAR(ordered_at, 'YYYY-MM') AS year_month,
SUM(total_amount) AS monthly_revenue
FROM orders
WHERE ordered_at >= '2023-01-01'
AND ordered_at < '2025-01-01'
AND status NOT IN ('cancelled', 'returned')
GROUP BY TO_CHAR(ordered_at, 'YYYY-MM')
HAVING SUM(total_amount) > 500000
ORDER BY year_month;
```
Problem 8
From the payments table, find the number of unique orders per payment method using COUNT(DISTINCT order_id). Sort by unique order count descending.
Answer
SELECT
method,
COUNT(DISTINCT order_id) AS unique_orders
FROM payments
GROUP BY method
ORDER BY unique_orders DESC;
Result (example):
| method | unique_orders |
|---|---|
| card | 16841 |
| kakao_pay | 7486 |
| naver_pay | 5715 |
| bank_transfer | 3718 |
| point | 1921 |
| virtual_account | 1876 |
| ... | ... |
Problem 9
From the orders table, find the order count and total revenue by year. Exclude cancelled/returned orders.
Answer
| order_year | order_count | yearly_revenue |
|---|---|---|
| 2016 | 388 | 288397247.0 |
| 2017 | 657 | 619679681.0 |
| 2018 | 1238 | 1179049206.0 |
| 2019 | 2422 | 2438425607.0 |
| 2020 | 4078 | 4182596871.0 |
| 2021 | 5501 | 5672563917.0 |
| 2022 | 4882 | 4922471211.0 |
| 2023 | 4719 | 4736163108.0 |
| ... | ... | ... |
=== "MySQL"
```sql
SELECT
YEAR(ordered_at) AS order_year,
COUNT(*) AS order_count,
SUM(total_amount) AS yearly_revenue
FROM orders
WHERE status NOT IN ('cancelled', 'returned')
GROUP BY YEAR(ordered_at)
ORDER BY order_year;
```
=== "PostgreSQL"
```sql
SELECT
EXTRACT(YEAR FROM ordered_at)::int AS order_year,
COUNT(*) AS order_count,
SUM(total_amount) AS yearly_revenue
FROM orders
WHERE status NOT IN ('cancelled', 'returned')
GROUP BY EXTRACT(YEAR FROM ordered_at)
ORDER BY order_year;
```
Problem 10
From the products table, find the product count, average price (0 decimal places), and total stock (stock_qty sum) by category_id. Show only categories with 5 or more products and an average price of 50 or higher, sorted by product count descending.
Answer
SELECT
category_id,
COUNT(*) AS product_count,
ROUND(AVG(price), 0) AS avg_price,
SUM(stock_qty) AS total_stock
FROM products
GROUP BY category_id
HAVING COUNT(*) >= 5
AND AVG(price) >= 50
ORDER BY product_count DESC;
Result (example):
| category_id | product_count | avg_price | total_stock |
|---|---|---|---|
| 18 | 13 | 529754.0 | 3826 |
| 30 | 13 | 219008.0 | 3812 |
| 43 | 12 | 277150.0 | 3085 |
| 3 | 11 | 1719809.0 | 2241 |
| 31 | 11 | 158482.0 | 2681 |
| 36 | 11 | 158000.0 | 3094 |
| 37 | 11 | 41064.0 | 3141 |
| 6 | 10 | 1748820.0 | 2833 |
| ... | ... | ... | ... |
Scoring Guide
| Score | Next Step |
|---|---|
| 9-10 | Move to Lesson 6: Handling NULL |
| 7-8 | Review the explanations for incorrect answers, then proceed to Lesson 6 |
| 5 or fewer | Read this lesson again |
| 3 or fewer | Start over from Lesson 4: Aggregate Functions |
Problem Areas:
| Area | Problems |
|---|---|
| GROUP BY + HAVING | 1, 5 |
| GROUP BY single column | 2, 3 |
| GROUP BY multiple columns | 4 |
| WHERE + GROUP BY + HAVING | 6 |
| GROUP BY + HAVING (date filter) | 7 |
| COUNT(DISTINCT) | 8 |
| GROUP BY (date function) + WHERE | 9 |
| HAVING multiple conditions | 10 |
Next: Lesson 6: Handling NULL