콘텐츠로 이동

SQL 면접 대비

사용 테이블

orders — 주문 (상태, 금액, 일시)

order_items — 주문 상세 (수량, 단가)

products — 상품 (이름, 가격, 재고, 브랜드)

categories — 카테고리 (부모-자식 계층)

customers — 고객 (등급, 포인트, 가입채널)

reviews — 리뷰 (평점, 내용)

product_views — 조회 로그 (고객, 상품, 일시)

calendar — 날짜 차원 (요일, 공휴일)

staff — 직원 (부서, 역할, 관리자)

학습 범위

DENSE_RANK, Consecutive N Days, Median, Session Analysis, MoM Growth Rate, Percentile, Working Days, Organization Chart Recursion

1. 두 번째로 높은 매출 상품 ★★★

전체 상품 중 총 매출이 두 번째로 높은 상품을 구하세요. 매출이 가장 높은 상품과 동일 매출인 상품이 여러 개일 수 있으므로 DENSE_RANK를 사용합니다.

출제 빈도: ★★★ (매우 높음) — Amazon, LeetCode #176 변형

product_name category total_revenue revenue_rank
... ... ... 2

힌트 1: - DENSE_RANK() OVER (ORDER BY total_revenue DESC): 동률 허용 순위 - CTE에서 상품별 매출 합산 → 순위 부여 → WHERE rank = 2

정답
WITH product_revenue AS (
    SELECT
        p.id,
        p.name AS product_name,
        cat.name AS category,
        SUM(oi.quantity * oi.unit_price) AS total_revenue
    FROM order_items AS oi
    JOIN orders     AS o   ON oi.order_id   = o.id
    JOIN products   AS p   ON oi.product_id = p.id
    JOIN categories AS cat ON p.category_id = cat.id
    WHERE o.status NOT IN ('cancelled')
    GROUP BY p.id, p.name, cat.name
),
ranked AS (
    SELECT *,
        DENSE_RANK() OVER (ORDER BY total_revenue DESC) AS revenue_rank
    FROM product_revenue
)
SELECT product_name, category,
       CAST(total_revenue AS INTEGER) AS total_revenue,
       revenue_rank
FROM ranked
WHERE revenue_rank = 2;

실행 결과 (1행)

product_name category total_revenue revenue_rank
Razer Blade 16 실버 게이밍 노트북 907,210,500 2

2. 누적 합계 (Running Total) ★★★

2024년 월별 매출연초부터의 누적 매출(YTD)을 구하세요.

출제 빈도: ★★★ (매우 높음) — Google, Meta 빈출

month monthly_revenue ytd_revenue
2024-01 ... ...

힌트 1: - SUM(월별매출) OVER (ORDER BY month) = 누적 합계 - 윈도우 함수의 기본 프레임은 UNBOUNDED PRECEDING ~ CURRENT ROW

정답
SELECT
    SUBSTR(ordered_at, 1, 7) AS month,
    CAST(SUM(total_amount) AS INTEGER) AS monthly_revenue,
    CAST(SUM(SUM(total_amount)) OVER (ORDER BY SUBSTR(ordered_at, 1, 7)) AS INTEGER) AS ytd_revenue
FROM orders
WHERE ordered_at >= '2024-01-01' AND ordered_at < '2025-01-01'
  AND status NOT IN ('cancelled')
GROUP BY SUBSTR(ordered_at, 1, 7)
ORDER BY month;

실행 결과 (총 12행 중 상위 7행)

month monthly_revenue ytd_revenue
2024-01 301,075,320 301,075,320
2024-02 426,177,449 727,252,769
2024-03 536,322,767 1,263,575,536
2024-04 470,154,081 1,733,729,617
2024-05 459,724,596 2,193,454,213
2024-06 377,040,302 2,570,494,515
2024-07 363,944,597 2,934,439,112

3. 중복 데이터 식별 ★★★

동일 고객이 동일 상품에 같은 날 주문한 건(중복 의심)을 찾으세요. 가장 최근 주문만 유효로 표시하고 나머지는 중복으로 플래그합니다.

출제 빈도: ★★★ (매우 높음) — LeetCode #196 변형, 카카오

order_id customer_name product_name ordered_at is_duplicate

힌트 1: - ROW_NUMBER() OVER (PARTITION BY customer_id, product_id, DATE(ordered_at) ORDER BY ordered_at DESC) - rn = 1이면 유효, rn > 1이면 중복

정답
WITH order_detail AS (
    SELECT
        o.id AS order_id,
        o.customer_id,
        oi.product_id,
        o.ordered_at,
        ROW_NUMBER() OVER (
            PARTITION BY o.customer_id, oi.product_id, DATE(o.ordered_at)
            ORDER BY o.ordered_at DESC
        ) AS rn
    FROM orders AS o
    JOIN order_items AS oi ON o.id = oi.order_id
    WHERE o.status NOT IN ('cancelled')
)
SELECT
    od.order_id,
    c.name AS customer_name,
    p.name AS product_name,
    od.ordered_at,
    CASE WHEN od.rn > 1 THEN 'Y' ELSE 'N' END AS is_duplicate
FROM order_detail AS od
JOIN customers AS c ON od.customer_id = c.id
JOIN products  AS p ON od.product_id  = p.id
WHERE od.rn <= 2
ORDER BY od.customer_id, od.product_id, DATE(od.ordered_at), od.rn;

실행 결과 (총 86,611행 중 상위 7행)

order_id customer_name product_name ordered_at is_duplicate
1049 김경수 Razer Blade 18 블랙 2017-12-04 15:52:09 N
31,251 김경수 Razer Blade 18 블랙 2025-01-02 18:41:57 N
243 김경수 MSI GeForce RTX 4070 Ti Super GAMING X 2016-08-17 23:29:34 N
17,814 김경수 MSI GeForce RTX 4070 Ti Super GAMING X 2022-07-18 12:29:51 N
21,134 김경수 MSI GeForce RTX 4070 Ti Super GAMING X 2023-03-04 08:54:35 N
5736 김경수 Dell U2724D 2020-03-09 16:09:46 N
236 김경수 G.SKILL Trident Z5 DDR5 64GB 6000MHz 화이트 2016-08-19 22:29:34 N

4. 중앙값 (Median) 구하기 ★★☆

고객별 주문 금액의 중앙값(median)을 구하세요. SQLite에는 MEDIAN 함수가 없으므로 윈도우 함수로 구현합니다.

출제 빈도: ★★☆ — Google, Amazon

customer_name order_count median_amount
... ... ...

힌트 1: - ROW_NUMBER()로 순위, COUNT(*) OVER()로 전체 건수 - 중앙값 = 전체 건수가 홀수이면 (n+1)/2번째, 짝수이면 n/2와 n/2+1의 평균 - WHERE rn IN (cnt/2, cnt/2+1, (cnt+1)/2)로 중앙 행 추출

정답
WITH numbered AS (
    SELECT
        customer_id,
        total_amount,
        ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY total_amount) AS rn,
        COUNT(*) OVER (PARTITION BY customer_id) AS cnt
    FROM orders
    WHERE status NOT IN ('cancelled')
),
median_rows AS (
    SELECT
        customer_id,
        ROUND(AVG(total_amount), 0) AS median_amount,
        MAX(cnt) AS order_count
    FROM numbered
    WHERE rn IN ((cnt + 1) / 2, (cnt + 2) / 2)
    GROUP BY customer_id
)
SELECT
    c.name AS customer_name,
    m.order_count,
    m.median_amount
FROM median_rows AS m
JOIN customers AS c ON m.customer_id = c.id
WHERE m.order_count >= 5
ORDER BY m.median_amount DESC
LIMIT 20;

실행 결과 (총 20행 중 상위 7행)

customer_name order_count median_amount
이진호 5 3,162,000.00
박경수 5 2,741,600.00
이민재 5 2,461,800.00
김준영 12 2,403,150.00
류명자 6 2,282,947.00
황채원 5 2,281,800.00
한승민 9 2,279,500.00

5. 연속 N일 로그인 (Islands) ★★★

product_views를 로그인 로그로 간주하여, 3일 이상 연속으로 활동한 고객을 찾으세요.

출제 빈도: ★★★ (매우 높음) — Meta, 쿠팡, LeetCode #180 변형

customer_name streak_days streak_start streak_end

힌트 1: - DATE(viewed_at) 중복 제거 후 연속 날짜 그룹 탐지 - DATE(viewed_at, '-' || (ROW_NUMBER()-1) || ' days') → 같은 그룹이면 동일 값 - HAVING COUNT(*) >= 3

정답
WITH active_days AS (
    SELECT DISTINCT
        customer_id,
        DATE(viewed_at) AS active_date
    FROM product_views
),
grouped AS (
    SELECT
        customer_id,
        active_date,
        DATE(active_date, '-' || (ROW_NUMBER() OVER (
            PARTITION BY customer_id ORDER BY active_date
        ) - 1) || ' days') AS grp
    FROM active_days
),
streaks AS (
    SELECT
        customer_id,
        COUNT(*) AS streak_days,
        MIN(active_date) AS streak_start,
        MAX(active_date) AS streak_end
    FROM grouped
    GROUP BY customer_id, grp
    HAVING COUNT(*) >= 3
)
SELECT
    c.name AS customer_name,
    s.streak_days,
    s.streak_start,
    s.streak_end
FROM streaks AS s
JOIN customers AS c ON s.customer_id = c.id
ORDER BY s.streak_days DESC, s.streak_start
LIMIT 20;

실행 결과 (총 20행 중 상위 7행)

customer_name streak_days streak_start streak_end
배춘자 46 2020-02-28 2020-04-13
이영자 40 2016-01-09 2016-02-17
김성민 39 2020-05-02 2020-06-09
김병철 27 2020-02-28 2020-03-25
홍옥순 26 2019-03-13 2019-04-07
김민재 25 2016-03-05 2016-03-29
이영자 25 2016-04-18 2016-05-12

6. 카테고리별 Top-N (그룹 내 순위) ★★★

각 카테고리에서 리뷰 평점이 가장 높은 상품 2개를 추출하세요. 리뷰 10건 이상인 상품만 대상으로 합니다. 동점이면 리뷰 수가 많은 상품이 우선합니다.

출제 빈도: ★★★ (매우 높음) — Amazon, 네이버, 카카오

category product_name avg_rating review_count rank

힌트 1: - HAVING COUNT(*) >= 10으로 리뷰 수 필터 - ROW_NUMBER() OVER (PARTITION BY category ORDER BY avg_rating DESC, review_count DESC) - WHERE rn <= 2

정답
WITH product_ratings AS (
    SELECT
        cat.name AS category,
        p.name   AS product_name,
        ROUND(AVG(r.rating), 2) AS avg_rating,
        COUNT(*)                AS review_count
    FROM reviews AS r
    JOIN products   AS p   ON r.product_id  = p.id
    JOIN categories AS cat ON p.category_id = cat.id
    GROUP BY cat.name, p.id, p.name
    HAVING COUNT(*) >= 10
),
ranked AS (
    SELECT *,
        ROW_NUMBER() OVER (
            PARTITION BY category
            ORDER BY avg_rating DESC, review_count DESC
        ) AS rn
    FROM product_ratings
)
SELECT category, product_name, avg_rating, review_count, rn AS rank
FROM ranked
WHERE rn <= 2
ORDER BY category, rn;

실행 결과 (총 70행 중 상위 7행)

category product_name avg_rating review_count rank
2in1 HP Pavilion x360 14 블랙 3.79 28 1
2in1 레노버 IdeaPad Flex 5 3.75 12 2
AMD MSI Radeon RX 9070 VENTUS 3X 화이트 4.08 40 1
AMD AMD Ryzen 9 9900X 4.08 13 2
AMD 소켓 MSI MAG X870E TOMAHAWK WIFI 화이트 4.06 32 1
AMD 소켓 ASRock B850M Pro RS 실버 4.04 25 2
DDR4 Kingston FURY Beast DDR4 16GB 블랙 4.06 52 1

7. 전년 동기 대비 (YoY) 성장률 ★★★

분기별 매출의 전년 동기 대비 성장률(YoY %)을 구하세요. 2023~2025년 데이터를 대상으로 합니다.

출제 빈도: ★★★ (매우 높음) — Google, Meta, 쿠팡

year quarter revenue prev_year_revenue yoy_growth_pct

힌트 1: - LAG(revenue, 4) OVER (ORDER BY year, quarter)로 전년 동분기 매출 참조 — 분기가 1~4이므로 4개 전이 전년 동기 - 또는 LAG(revenue, 1) OVER (PARTITION BY quarter ORDER BY year)

정답
WITH quarterly AS (
    SELECT
        CAST(SUBSTR(ordered_at, 1, 4) AS INTEGER) AS year,
        CASE
            WHEN CAST(SUBSTR(ordered_at, 6, 2) AS INTEGER) <= 3 THEN 1
            WHEN CAST(SUBSTR(ordered_at, 6, 2) AS INTEGER) <= 6 THEN 2
            WHEN CAST(SUBSTR(ordered_at, 6, 2) AS INTEGER) <= 9 THEN 3
            ELSE 4
        END AS quarter,
        SUM(total_amount) AS revenue
    FROM orders
    WHERE status NOT IN ('cancelled')
      AND ordered_at >= '2023-01-01'
    GROUP BY 1, 2
)
SELECT
    year,
    quarter,
    CAST(revenue AS INTEGER) AS revenue,
    CAST(LAG(revenue) OVER (PARTITION BY quarter ORDER BY year) AS INTEGER) AS prev_year_revenue,
    ROUND(100.0 * (revenue - LAG(revenue) OVER (PARTITION BY quarter ORDER BY year))
        / LAG(revenue) OVER (PARTITION BY quarter ORDER BY year), 1) AS yoy_growth_pct
FROM quarterly
ORDER BY year, quarter;

실행 결과 (총 12행 중 상위 7행)

year quarter revenue prev_year_revenue yoy_growth_pct
2023 1 1,112,502,647 NULL NULL
2023 2 1,075,865,258 NULL NULL
2023 3 1,162,362,566 NULL NULL
2023 4 1,464,300,253 NULL NULL
2024 1 1,263,575,536 1,112,502,647 13.60
2024 2 1,306,918,979 1,075,865,258 21.50
2024 3 1,340,721,817 1,162,362,566 15.30

8. 이동 평균 (Moving Average) ★★☆

2024년 일별 매출의 7일 이동 평균을 구하세요. 이동 평균은 당일 포함 직전 7일간의 평균입니다.

출제 빈도: ★★☆ — Google, 네이버

order_date daily_revenue ma_7d

힌트 1: - AVG(daily_revenue) OVER (ORDER BY order_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) - calendar와 LEFT JOIN하여 주문이 없는 날도 0으로 포함시키기

정답
WITH daily AS (
    SELECT
        cal.date_key AS order_date,
        COALESCE(SUM(o.total_amount), 0) AS daily_revenue
    FROM calendar AS cal
    LEFT JOIN orders AS o
        ON DATE(o.ordered_at) = cal.date_key
       AND o.status NOT IN ('cancelled')
    WHERE cal.date_key BETWEEN '2024-01-01' AND '2024-12-31'
    GROUP BY cal.date_key
)
SELECT
    order_date,
    CAST(daily_revenue AS INTEGER) AS daily_revenue,
    CAST(AVG(daily_revenue) OVER (
        ORDER BY order_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS INTEGER) AS ma_7d
FROM daily
ORDER BY order_date;

실행 결과 (총 366행 중 상위 7행)

order_date daily_revenue ma_7d
2024-01-01 7,732,372 7,732,372
2024-01-02 14,807,975 11,270,173
2024-01-03 2,825,828 8,455,391
2024-01-04 4,332,900 7,424,768
2024-01-05 8,083,504 7,556,515
2024-01-06 9,182,200 7,827,463
2024-01-07 23,522,036 10,069,545

9. 백분위수 (Percentile) ★★☆

고객별 총 구매금액이 상위 10%, 25%, 50%(중위수), 75%, 90% 경계에 해당하는 값을 구하세요.

출제 빈도: ★★☆ — Amazon, Google

percentile threshold_amount
10 ...
25 ...

힌트 1: - NTILE(100) OVER (ORDER BY total_spent)로 백분위 그룹 부여 - 각 백분위 경계값: WHERE percentile_group IN (10, 25, 50, 75, 90)에서 MAX 값

정답
WITH customer_spent AS (
    SELECT
        customer_id,
        SUM(total_amount) AS total_spent
    FROM orders
    WHERE status NOT IN ('cancelled')
    GROUP BY customer_id
),
with_percentile AS (
    SELECT
        total_spent,
        NTILE(100) OVER (ORDER BY total_spent) AS pctl
    FROM customer_spent
)
SELECT
    pctl AS percentile,
    CAST(MAX(total_spent) AS INTEGER) AS threshold_amount
FROM with_percentile
WHERE pctl IN (10, 25, 50, 75, 90)
GROUP BY pctl
ORDER BY pctl;

실행 결과 (5행)

percentile threshold_amount
10 180,500
25 1,183,200
50 4,654,232
75 13,607,591
90 31,606,150

10. 전일 대비 변화율 + 7일 이동 변화율 ★★☆

2024년 12월의 일별 주문 수에 대해 전일 대비 변화율(DoD %)7일 전 대비 변화율(WoW %)을 동시에 구하세요.

출제 빈도: ★★☆ — Meta, 쿠팡

order_date order_count prev_day dod_pct prev_week wow_pct

힌트 1: - LAG(order_count, 1) = 전일, LAG(order_count, 7) = 7일 전 - 변화율 = (당일 - 이전) / 이전 * 100 - calendar LEFT JOIN으로 빠진 날도 포함

정답
WITH daily AS (
    SELECT
        cal.date_key AS order_date,
        COALESCE(COUNT(o.id), 0) AS order_count
    FROM calendar AS cal
    LEFT JOIN orders AS o
        ON DATE(o.ordered_at) = cal.date_key
       AND o.status NOT IN ('cancelled')
    WHERE cal.date_key BETWEEN '2024-12-01' AND '2024-12-31'
    GROUP BY cal.date_key
)
SELECT
    order_date,
    order_count,
    LAG(order_count, 1) OVER (ORDER BY order_date) AS prev_day,
    ROUND(100.0 * (order_count - LAG(order_count, 1) OVER (ORDER BY order_date))
        / NULLIF(LAG(order_count, 1) OVER (ORDER BY order_date), 0), 1) AS dod_pct,
    LAG(order_count, 7) OVER (ORDER BY order_date) AS prev_week,
    ROUND(100.0 * (order_count - LAG(order_count, 7) OVER (ORDER BY order_date))
        / NULLIF(LAG(order_count, 7) OVER (ORDER BY order_date), 0), 1) AS wow_pct
FROM daily
ORDER BY order_date;

실행 결과 (총 31행 중 상위 7행)

order_date order_count prev_day dod_pct prev_week wow_pct
2024-12-01 17 NULL NULL NULL NULL
2024-12-02 14 17 -17.60 NULL NULL
2024-12-03 15 14 7.10 NULL NULL
2024-12-04 14 15 -6.70 NULL NULL
2024-12-05 9 14 -35.70 NULL NULL
2024-12-06 14 9 55.60 NULL NULL
2024-12-07 15 14 7.10 NULL NULL

11. 재귀 CTE: 조직도 트리 ★★★

staff 테이블의 manager_id를 이용하여 전체 조직 계층 구조를 출력하세요. 직급(depth), 상사 이름, 전체 경로(CEO > ... > 본인)를 포함합니다.

출제 빈도: ★★★ (매우 높음) — Amazon, 카카오, 네이버

name department role depth manager_name path
CEO이름 management admin 0 NULL CEO이름
... ... ... 1 CEO이름 CEO이름 > ...

힌트 1: - Recursive CTE: WHERE manager_id IS NULL이 앵커(루트) - 재귀 파트에서 s.manager_id = tree.id로 조인 - path || ' > ' || s.name으로 경로 누적

정답
WITH RECURSIVE org_tree AS (
    -- 앵커: 최상위 관리자 (manager_id IS NULL)
    SELECT
        s.id,
        s.name,
        s.department,
        s.role,
        0 AS depth,
        CAST(NULL AS TEXT) AS manager_name,
        s.name AS path
    FROM staff AS s
    WHERE s.manager_id IS NULL

    UNION ALL

    -- 재귀: 하위 직원
    SELECT
        s.id,
        s.name,
        s.department,
        s.role,
        t.depth + 1,
        t.name AS manager_name,
        t.path || ' > ' || s.name
    FROM staff AS s
    JOIN org_tree AS t ON s.manager_id = t.id
)
SELECT name, department, role, depth, manager_name, path
FROM org_tree
ORDER BY path;

실행 결과 (5행)

name department role depth manager_name path
한민재 경영 admin 0 NULL 한민재
박경수 경영 admin 1 한민재 한민재 > 박경수
권영희 마케팅 manager 2 박경수 한민재 > 박경수 > 권영희
이준혁 영업 manager 1 한민재 한민재 > 이준혁
장주원 경영 admin 1 한민재 한민재 > 장주원

12. 재귀 CTE: 날짜 시퀀스 생성 ★★☆

2024년 12월의 모든 날짜를 재귀 CTE로 생성하고, 각 날짜의 주문 수와 매출을 구하세요. (주문이 없는 날도 0으로 표시)

출제 빈도: ★★☆ — Google (calendar 테이블 없는 환경에서)

dt order_count revenue
2024-12-01 ... ...

힌트 1: - 앵커: SELECT '2024-12-01' AS dt - 재귀: SELECT DATE(dt, '+1 day') FROM dates WHERE dt < '2024-12-31' - LEFT JOIN orders로 주문 데이터 결합

정답
WITH RECURSIVE dates AS (
    SELECT '2024-12-01' AS dt
    UNION ALL
    SELECT DATE(dt, '+1 day')
    FROM dates
    WHERE dt < '2024-12-31'
)
SELECT
    d.dt,
    COUNT(o.id) AS order_count,
    COALESCE(CAST(SUM(o.total_amount) AS INTEGER), 0) AS revenue
FROM dates AS d
LEFT JOIN orders AS o
    ON DATE(o.ordered_at) = d.dt
   AND o.status NOT IN ('cancelled')
GROUP BY d.dt
ORDER BY d.dt;

실행 결과 (총 31행 중 상위 7행)

dt order_count revenue
2024-12-01 17 12,081,245
2024-12-02 14 12,578,657
2024-12-03 15 11,867,860
2024-12-04 14 11,198,303
2024-12-05 9 5,489,585
2024-12-06 14 16,160,600
2024-12-07 15 16,802,502

13. 코호트 분석 (가입월별 재구매) ★★★

고객의 가입 월(cohort) 기준으로, 가입 후 0~3개월 시점의 구매 고객 비율을 구하세요. 2024년 가입 고객 대상입니다.

출제 빈도: ★★★ (매우 높음) — Meta, 쿠팡, 네이버

cohort size m0_pct m1_pct m2_pct m3_pct

힌트 1: - 코호트: SUBSTR(created_at, 1, 7) - 월 오프셋: (julianday(주문월-01) - julianday(가입월-01)) / 30을 정수 변환 - COUNT(DISTINCT CASE WHEN offset = N THEN customer_id END) / 코호트 크기

정답
WITH cohort AS (
    SELECT
        id AS customer_id,
        SUBSTR(created_at, 1, 7) AS cohort_month
    FROM customers
    WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01'
),
order_month AS (
    SELECT DISTINCT
        co.customer_id,
        co.cohort_month,
        CAST(
            (julianday(SUBSTR(o.ordered_at, 1, 7) || '-01')
           - julianday(co.cohort_month || '-01')) / 30
        AS INTEGER) AS month_offset
    FROM cohort AS co
    JOIN orders AS o ON co.customer_id = o.customer_id
    WHERE o.status NOT IN ('cancelled')
)
SELECT
    c.cohort_month AS cohort,
    COUNT(DISTINCT c.customer_id) AS size,
    ROUND(100.0 * COUNT(DISTINCT CASE WHEN om.month_offset = 0 THEN om.customer_id END)
        / COUNT(DISTINCT c.customer_id), 1) AS m0_pct,
    ROUND(100.0 * COUNT(DISTINCT CASE WHEN om.month_offset = 1 THEN om.customer_id END)
        / COUNT(DISTINCT c.customer_id), 1) AS m1_pct,
    ROUND(100.0 * COUNT(DISTINCT CASE WHEN om.month_offset = 2 THEN om.customer_id END)
        / COUNT(DISTINCT c.customer_id), 1) AS m2_pct,
    ROUND(100.0 * COUNT(DISTINCT CASE WHEN om.month_offset = 3 THEN om.customer_id END)
        / COUNT(DISTINCT c.customer_id), 1) AS m3_pct
FROM cohort AS c
LEFT JOIN order_month AS om ON c.customer_id = om.customer_id
    AND c.cohort_month = om.cohort_month
GROUP BY c.cohort_month
ORDER BY c.cohort_month;

실행 결과 (총 12행 중 상위 7행)

cohort size m0_pct m1_pct m2_pct m3_pct
2024-01 52 11.50 5.80 9.60 7.70
2024-02 48 18.80 0.0 6.30 10.40
2024-03 71 16.90 14.10 15.50 2.80
2024-04 53 5.70 7.50 7.50 3.80
2024-05 43 9.30 9.30 16.30 4.70
2024-06 68 10.30 5.90 5.90 8.80
2024-07 62 8.10 9.70 11.30 11.30

14. 카테고리 계층 집계 (Recursive + GROUP BY) ★★☆

categories 트리에서 대분류별 총 매출을 구하세요. 하위 카테고리(중/소)의 매출을 모두 상위(대)로 롤업합니다.

출제 빈도: ★★☆ — Amazon, 카카오

top_category sub_category_count product_count total_revenue

힌트 1: - Recursive CTE로 각 카테고리의 루트(depth=0) 조상 찾기 - 재귀 탐색 후 루트 기준으로 GROUP BY

정답
WITH RECURSIVE cat_tree AS (
    SELECT id, id AS root_id, name AS root_name, depth
    FROM categories
    WHERE parent_id IS NULL

    UNION ALL

    SELECT c.id, ct.root_id, ct.root_name, c.depth
    FROM categories AS c
    JOIN cat_tree AS ct ON c.parent_id = ct.id
)
SELECT
    ct.root_name AS top_category,
    COUNT(DISTINCT CASE WHEN ct.depth > 0 THEN ct.id END) AS sub_category_count,
    COUNT(DISTINCT p.id) AS product_count,
    COALESCE(CAST(SUM(oi.quantity * oi.unit_price) AS INTEGER), 0) AS total_revenue
FROM cat_tree AS ct
LEFT JOIN products AS p ON p.category_id = ct.id
LEFT JOIN order_items AS oi ON oi.product_id = p.id
LEFT JOIN orders AS o ON oi.order_id = o.id
    AND o.status NOT IN ('cancelled', 'returned', 'return_requested')
WHERE ct.root_id = ct.root_id
GROUP BY ct.root_id, ct.root_name
ORDER BY total_revenue DESC;

실행 결과 (총 18행 중 상위 7행)

top_category sub_category_count product_count total_revenue
노트북 4 29 10,144,187,100
그래픽카드 2 15 5,608,961,100
모니터 3 22 4,753,611,200
메인보드 2 23 3,255,592,700
CPU 2 7 1,877,389,800
스피커/헤드셋 0 12 1,561,393,600
저장장치 3 15 1,524,801,600

15. Self-Join: 상사보다 급여(매출 처리 건수)가 많은 직원 ★★★

각 직원이 처리한 주문 건수를 구하고, 자신의 상사보다 많은 주문을 처리한 직원을 찾으세요. (orders.staff_id 기준)

출제 빈도: ★★★ (매우 높음) — LeetCode #181 변형, 카카오

staff_name department handled_orders manager_name manager_orders

힌트 1: - staff AS s JOIN staff AS m ON s.manager_id = m.id (Self-Join) - 각 직원의 주문 건수: orders.staff_id로 카운트 - WHERE s_count > m_count

정답
WITH staff_orders AS (
    SELECT
        s.id,
        s.name,
        s.department,
        s.manager_id,
        COUNT(o.id) AS handled_orders
    FROM staff AS s
    LEFT JOIN orders AS o ON s.id = o.staff_id
    GROUP BY s.id, s.name, s.department, s.manager_id
)
SELECT
    emp.name       AS staff_name,
    emp.department,
    emp.handled_orders,
    mgr.name       AS manager_name,
    mgr.handled_orders AS manager_orders
FROM staff_orders AS emp
JOIN staff_orders AS mgr ON emp.manager_id = mgr.id
WHERE emp.handled_orders > mgr.handled_orders
ORDER BY emp.handled_orders DESC;

16. 다단계 분석: 할인 효과 측정 ★★☆

쿠폰을 사용한 고객과 미사용 고객을 비교하여 다음을 구하세요: (1) 그룹별 평균 주문 금액, (2) 재구매율, (3) 평균 리뷰 평점.

출제 빈도: ★★☆ — 쿠팡, 네이버

segment customer_count avg_order_value repeat_rate_pct avg_rating

힌트 1: - 쿠폰 사용 여부: coupon_usage 존재 여부로 세그먼트 분류 - 재구매율: 2건 이상 주문한 고객 / 전체 고객 - 3개 테이블(orders, coupon_usage, reviews)을 CTE로 각각 집계 후 합치기

정답
WITH coupon_customers AS (
    SELECT DISTINCT customer_id FROM coupon_usage
),
customer_segment AS (
    SELECT
        c.id AS customer_id,
        CASE WHEN cc.customer_id IS NOT NULL THEN 'Coupon User' ELSE 'Non-Coupon' END AS segment
    FROM customers AS c
    LEFT JOIN coupon_customers AS cc ON c.id = cc.customer_id
    WHERE c.is_active = 1
),
order_stats AS (
    SELECT
        cs.segment,
        cs.customer_id,
        COUNT(o.id) AS order_count,
        AVG(o.total_amount) AS avg_order_value
    FROM customer_segment AS cs
    LEFT JOIN orders AS o ON cs.customer_id = o.customer_id
        AND o.status NOT IN ('cancelled')
    GROUP BY cs.segment, cs.customer_id
),
review_stats AS (
    SELECT
        cs.segment,
        ROUND(AVG(r.rating), 2) AS avg_rating
    FROM customer_segment AS cs
    JOIN reviews AS r ON cs.customer_id = r.customer_id
    GROUP BY cs.segment
)
SELECT
    os.segment,
    COUNT(DISTINCT os.customer_id) AS customer_count,
    ROUND(AVG(os.avg_order_value), 0) AS avg_order_value,
    ROUND(100.0 * SUM(CASE WHEN os.order_count >= 2 THEN 1 ELSE 0 END)
        / COUNT(DISTINCT os.customer_id), 1) AS repeat_rate_pct,
    rs.avg_rating
FROM order_stats AS os
LEFT JOIN review_stats AS rs ON os.segment = rs.segment
GROUP BY os.segment, rs.avg_rating
ORDER BY os.segment;

실행 결과 (2행)

segment customer_count avg_order_value repeat_rate_pct avg_rating
Coupon User 888 1,030,689.00 97.20 3.89
Non-Coupon 2772 843,056.00 52.50 3.93

17. 데이터 품질 체크: NULL/이상치 탐지 ★★☆

다음 데이터 품질 문제를 한 쿼리로 보고하세요: (1) 주문 금액이 0 이하, (2) 배송일 < 주문일, (3) 리뷰 평점 범위 밖, (4) 미래 날짜 주문.

출제 빈도: ★★☆ — Amazon (Data Engineer), 카카오

issue_type table_name record_count sample_ids

힌트 1: - UNION ALL로 각 품질 검사 결과를 합치기 - GROUP_CONCAT(id, ',')으로 샘플 ID 나열 - 미래 날짜: ordered_at > '2025-12-31'

정답
SELECT 'Zero/Negative Amount' AS issue_type,
       'orders' AS table_name,
       COUNT(*) AS record_count,
       GROUP_CONCAT(id, ',') AS sample_ids
FROM orders
WHERE total_amount <= 0

UNION ALL

SELECT 'Delivery Before Shipment',
       'shipping',
       COUNT(*),
       GROUP_CONCAT(id, ',')
FROM shipping
WHERE delivered_at IS NOT NULL
  AND shipped_at IS NOT NULL
  AND delivered_at < shipped_at

UNION ALL

SELECT 'Rating Out of Range',
       'reviews',
       COUNT(*),
       GROUP_CONCAT(id, ',')
FROM reviews
WHERE rating < 1 OR rating > 5

UNION ALL

SELECT 'Future Order Date',
       'orders',
       COUNT(*),
       GROUP_CONCAT(id, ',')
FROM orders
WHERE ordered_at > DATE('now', '+1 day');

실행 결과 (4행)

issue_type table_name record_count sample_ids
Zero/Negative Amount orders 0 NULL
Delivery Before Shipment shipping 0 NULL
Rating Out of Range reviews 0 NULL
Future Order Date orders 0 NULL

18. 시계열 이상 탐지: 3-sigma 규칙 ★☆☆

일별 매출의 평균 +/- 3 표준편차 범위를 벗어나는 이상치(outlier) 날짜를 찾으세요.

출제 빈도: ★☆☆ — Google (Data Science)

order_date daily_revenue avg_revenue stddev z_score

힌트 1: - 표준편차: SQLite에서 직접 지원하지 않으므로 수동 계산 - SQRT(AVG(x*x) - AVG(x)*AVG(x)) = 모표준편차 - Z-score = (값 - 평균) / 표준편차, ABS(z) > 3이면 이상치

정답
WITH daily AS (
    SELECT
        DATE(ordered_at) AS order_date,
        SUM(total_amount) AS daily_revenue
    FROM orders
    WHERE status NOT IN ('cancelled')
    GROUP BY DATE(ordered_at)
),
stats AS (
    SELECT
        AVG(daily_revenue) AS avg_rev,
        SQRT(AVG(daily_revenue * daily_revenue) - AVG(daily_revenue) * AVG(daily_revenue)) AS stddev_rev
    FROM daily
)
SELECT
    d.order_date,
    CAST(d.daily_revenue AS INTEGER) AS daily_revenue,
    CAST(s.avg_rev AS INTEGER) AS avg_revenue,
    CAST(s.stddev_rev AS INTEGER) AS stddev,
    ROUND((d.daily_revenue - s.avg_rev) / s.stddev_rev, 2) AS z_score
FROM daily AS d
CROSS JOIN stats AS s
WHERE ABS((d.daily_revenue - s.avg_rev) / s.stddev_rev) > 3
ORDER BY ABS((d.daily_revenue - s.avg_rev) / s.stddev_rev) DESC;

실행 결과 (총 48행 중 상위 7행)

order_date daily_revenue avg_revenue stddev z_score
2025-12-18 62,329,008 10,077,964 8,237,014 6.34
2025-03-05 62,266,624 10,077,964 8,237,014 6.34
2020-11-21 60,470,134 10,077,964 8,237,014 6.12
2025-12-21 56,392,485 10,077,964 8,237,014 5.62
2020-02-09 51,620,600 10,077,964 8,237,014 5.04
2025-05-06 51,138,521 10,077,964 8,237,014 4.98
2022-01-06 50,993,500 10,077,964 8,237,014 4.97

19. 복합 분석: RFM 세그먼테이션 ★★★

고객을 RFM(Recency, Frequency, Monetary) 기준으로 세그먼트하세요. 각 지표를 1~5 등급으로 나누고, 세그먼트별 고객 수와 평균 매출을 구합니다.

출제 빈도: ★★★ (매우 높음) — 쿠팡, 네이버, Amazon

rfm_segment r_score f_score m_score customer_count avg_monetary
Champions 5 5 5 ... ...

힌트 1: - Recency: 마지막 주문으로부터 경과일 → NTILE(5) (최근일수록 높은 점수) - Frequency: 주문 횟수 → NTILE(5) - Monetary: 총 구매금액 → NTILE(5) - 세그먼트 분류: R+F+M 합이 13~15=Champions, 10~12=Loyal, 7~9=Potential, 4~6=AtRisk, 3=Lost

정답
WITH rfm_raw AS (
    SELECT
        customer_id,
        CAST(julianday('2025-06-30') - julianday(MAX(ordered_at)) AS INTEGER) AS recency,
        COUNT(*) AS frequency,
        CAST(SUM(total_amount) AS INTEGER) AS monetary
    FROM orders
    WHERE status NOT IN ('cancelled')
    GROUP BY customer_id
),
rfm_scored AS (
    SELECT
        customer_id,
        recency, frequency, monetary,
        NTILE(5) OVER (ORDER BY recency DESC)  AS r_score,
        NTILE(5) OVER (ORDER BY frequency ASC)  AS f_score,
        NTILE(5) OVER (ORDER BY monetary ASC)   AS m_score
    FROM rfm_raw
),
rfm_segmented AS (
    SELECT *,
        r_score + f_score + m_score AS rfm_total,
        CASE
            WHEN r_score + f_score + m_score >= 13 THEN 'Champions'
            WHEN r_score + f_score + m_score >= 10 THEN 'Loyal'
            WHEN r_score + f_score + m_score >= 7  THEN 'Potential'
            WHEN r_score + f_score + m_score >= 4  THEN 'At Risk'
            ELSE 'Lost'
        END AS rfm_segment
    FROM rfm_scored
)
SELECT
    rfm_segment,
    ROUND(AVG(r_score), 1) AS r_score,
    ROUND(AVG(f_score), 1) AS f_score,
    ROUND(AVG(m_score), 1) AS m_score,
    COUNT(*) AS customer_count,
    CAST(AVG(monetary) AS INTEGER) AS avg_monetary
FROM rfm_segmented
GROUP BY rfm_segment
ORDER BY AVG(rfm_total) DESC;

실행 결과 (5행)

rfm_segment r_score f_score m_score customer_count avg_monetary
Champions 4.40 4.80 4.80 579 42,854,633
Loyal 3.30 3.80 3.80 655 11,437,042
Potential 2.80 2.60 2.60 788 4,077,164
At Risk 2.10 1.50 1.50 653 1,071,866
Lost 1.00 1.00 1.00 134 175,595

20. 종합 시나리오: 마케팅 캠페인 효과 분석 ★★★

2024년 진행된 프로모션별로 다음을 분석하세요: (1) 참여 고객 수, (2) 프로모션 기간 매출 vs 직전 동일 기간 매출(증분 효과), (3) 프로모션 종료 후 30일간 재구매율, (4) 고객 획득 비용(할인 총액/신규 고객 수).

출제 빈도: ★★★ (매우 높음) — 쿠팡, 네이버, Meta (종합 문제)

promo_name promo_type participants promo_revenue pre_revenue lift_pct post_repurchase_pct cac

힌트 1: - promotionsstarted_at/ended_at으로 기간 설정 - 직전 동일 기간: DATE(started_at, '-' || (julianday(ended_at)-julianday(started_at)) || ' days') - 신규 고객: 프로모션 기간 중 첫 주문인 고객 - CTE 4~5단계로 분리

정답
WITH promo_periods AS (
    SELECT
        pr.id AS promo_id,
        pr.name AS promo_name,
        pr.type AS promo_type,
        pr.started_at,
        pr.ended_at,
        CAST(julianday(pr.ended_at) - julianday(pr.started_at) AS INTEGER) AS duration_days
    FROM promotions AS pr
    WHERE pr.started_at >= '2024-01-01' AND pr.started_at < '2025-01-01'
),
promo_orders AS (
    SELECT
        pp.promo_id, pp.promo_name, pp.promo_type,
        pp.started_at, pp.ended_at, pp.duration_days,
        o.id AS order_id,
        o.customer_id,
        o.total_amount,
        o.discount_amount
    FROM promo_periods AS pp
    JOIN orders AS o
        ON o.ordered_at BETWEEN pp.started_at AND pp.ended_at
       AND o.status NOT IN ('cancelled')
),
pre_period_revenue AS (
    SELECT
        pp.promo_id,
        COALESCE(SUM(o.total_amount), 0) AS pre_revenue
    FROM promo_periods AS pp
    LEFT JOIN orders AS o
        ON o.ordered_at BETWEEN DATE(pp.started_at, '-' || pp.duration_days || ' days')
                            AND DATE(pp.started_at, '-1 day')
       AND o.status NOT IN ('cancelled')
    GROUP BY pp.promo_id
),
post_repurchase AS (
    SELECT
        po.promo_id,
        COUNT(DISTINCT CASE
            WHEN EXISTS (
                SELECT 1 FROM orders o2
                WHERE o2.customer_id = po.customer_id
                  AND o2.ordered_at > po.ended_at
                  AND o2.ordered_at <= DATE(po.ended_at, '+30 days')
                  AND o2.status NOT IN ('cancelled')
            )
            THEN po.customer_id
        END) AS repurchase_customers,
        COUNT(DISTINCT po.customer_id) AS total_customers
    FROM promo_orders AS po
    GROUP BY po.promo_id
)
SELECT
    po.promo_name,
    po.promo_type,
    COUNT(DISTINCT po.customer_id) AS participants,
    CAST(SUM(po.total_amount) AS INTEGER) AS promo_revenue,
    CAST(pr.pre_revenue AS INTEGER) AS pre_revenue,
    ROUND(100.0 * (SUM(po.total_amount) - pr.pre_revenue)
        / NULLIF(pr.pre_revenue, 0), 1) AS lift_pct,
    ROUND(100.0 * rp.repurchase_customers / NULLIF(rp.total_customers, 0), 1) AS post_repurchase_pct,
    CAST(SUM(po.discount_amount) / NULLIF(
        COUNT(DISTINCT CASE
            WHEN NOT EXISTS (
                SELECT 1 FROM orders o3
                WHERE o3.customer_id = po.customer_id
                  AND o3.ordered_at < po.started_at
                  AND o3.status NOT IN ('cancelled')
            )
            THEN po.customer_id
        END), 0) AS INTEGER) AS cac
FROM promo_orders AS po
JOIN pre_period_revenue AS pr ON po.promo_id = pr.promo_id
JOIN post_repurchase AS rp ON po.promo_id = rp.promo_id
GROUP BY po.promo_id, po.promo_name, po.promo_type,
         pr.pre_revenue, rp.repurchase_customers, rp.total_customers
ORDER BY promo_revenue DESC;

실행 결과 (총 12행 중 상위 7행)

promo_name promo_type participants promo_revenue pre_revenue lift_pct post_repurchase_pct cac
신학기 노트북 특가 2024 category 317 359,327,629 356,716,760 0.7 32.50 74,055
추석 선물 세일 2024 seasonal 171 231,954,685 142,169,738 63.20 36.80 155,025
봄맞이 세일 2024 seasonal 226 212,486,725 217,467,681 -2.30 35.80 78,385
연말 감사 세일 2024 seasonal 186 192,873,636 188,843,699 2.10 31.20 121,744
여름 쿨링 페스티벌 2024 category 168 191,169,469 178,340,805 7.20 26.20 120,630
프린터 특가 2024 category 121 147,372,860 158,556,326 -7.10 30.60 120,300
새해 특가 세일 2024 seasonal 68 70,486,815 54,386,890 29.60 25.00 148,566