前言

前陣子進行環球誠信後端工程師面試，試題為資料庫操作各種問題，過程中發現滿多沒用過的語法以及過去比較少用SQL處理複雜的事務，事後再針對考題進行複盤練習。

原始碼

在phantom_mask專案的Django新增一個uinterview應用，實現考題上所有需要的測試資料以及各項考題單元測試。
Github code

指令

匯入測試資料python manage.py insert_uinterview_test_data
所有試題測試輸出python manage.py test_uinterview
單一試題測試輸出pymanage test_uinterview --show_sql 1 --case 1
- show_sql: 是否顯示已執行SQL，預設為0
- case: 測試試題編號，範圍為1~17

題目一

Question

如果客户的deliver_date和order_date相同，則該訂單為real-time order，
否則都為plan order 查詢表格中real-time order所佔的百分比，四捨五入到小數點後2位。

==================== Delivery Table ====================
  id    customer_id  order_date    delivery_date
----  -------------  ------------  ---------------
 208              1  2019-08-01    2019-08-02
 209              5  2019-08-02    2019-08-02
 210              1  2019-08-11    2019-08-11
 211              3  2019-08-24    2019-08-26
 212              4  2019-08-21    2019-08-22
 213              2  2019-08-11    2019-08-13

Answer

1	{'real_time_order_percent': 33.33}

解法一

# Delivery test_case_1
# python manage.py test_uinterview --case 1
# 使用ORM

# 先取得real_time_order的數量
# 使用F()進行其他欄位比較
real_order_count = Delivery.objects \
    .filter(delivery_date=F('order_date')) \
    .count()

# 透過aggregate()函數進行聚合計算總筆數
# 需*100.0，否則會被當成整數
# 使用Cast(value,type)函數將數值轉換為float
queryset = Delivery.objects.aggregate( \
    real_time_order_percent=Cast( 
        Round(real_order_count * 100.0 / Count('*')),
        FloatField()
    )
)

print(queryset)

# Print result:
# {'real_time_order_percent': 33.33}

解法二

# Delivery test_case_1_2
# python manage.py test_uinterview --case 2
# 使用SQL

with connection.cursor() as cursor:
    cursor.execute(
        f"""
        SELECT CAST(
                    ROUND(
                        (SELECT COUNT(*) 
                        FROM uinterview_delivery
                        -- 需*100.0，否則會被當成int
                        WHERE uinterview_delivery.delivery_date = uinterview_delivery.order_date) * 100.0  / COUNT(*)
                    , 2) AS double precision) AS real_time_order_percent
        FROM uinterview_delivery
    """
    )

    utils.print_cursor(cursor)

# Print result:
#   real_time_order_percent
# -------------------------
#                     33.33

題目二

Question

表格中有重複email的資料，當資料中有重複的email時將is_delete標記為true，保留Id最小的資料。

=============== Email Table ===============
  id  email              is_delete
----  -----------------  -----------
  96  [email protected]  False
  95  [email protected]   False
  94  [email protected]  False

Answer

=============== Email Table ===============
  id  email              is_delete
----  -----------------  -----------
  94  [email protected]  False
  95  [email protected]   False
  96  [email protected]  True

解法一

# Email test_case_1
# python manage.py test_uinterview --case 3
# 使用ORM

# 透過distinct('email')找出不重覆Email的id，並以email,id進行降序排序
keep_email_ids = Email.objects \
    .values_list('id', flat=True) \
    .distinct('email') \
    .order_by('email', 'id')

# 再透過exclude(id__in=keep_email_ids)排除不重覆的id，將is_delete標記為true
Email.objects.exclude(id__in=keep_email_ids).update(is_delete=True)

utils.print_queryset(Email.objects.all().order_by('id'))

# Print result:
# =============== Email Table ===============
#   id  email              is_delete
# ----  -----------------  -----------
#   94  [email protected]  False
#   95  [email protected]   False
#   96  [email protected]  True

解法二

# Email test_case_1_2
# python manage.py test_uinterview --case 4
# 使用SQL

with connection.cursor() as cursor:
    cursor.execute(
        f"""
         UPDATE uinterview_email
         SET is_delete = true 
         WHERE NOT (id IN (
                        SELECT DISTINCT ON (email) id 
                        FROM uinterview_email 
                        ORDER BY email ASC, id ASC)
                  )
         RETURNING *;
    """
    )

    utils.print_cursor(cursor)

# Print result:
#  id  email              is_delete
# ----  -----------------  -----------
#  96  [email protected]  True

題目三

Question

找出表格中員工的獎金小於750元或沒有領到獎金的員工姓名和獎金。

==================== Employee Table ====================
  id  name    sex    sex2
----  ------  -----  ------
 109  John    M
 110  Dan     F
 111  Brad    M
 112  Tomas   F
=============== Employee Bonus Table ===============
  id    employee    bonus
----  ----------  -------
  55         110      500
  56         112     2000

Answer

1
2
3

name: John, bonus: []
name: Dan, bonus: [500]
name: Brad, bonus: []

解法一

# Employee test_case_1
# python manage.py test_uinterview --case 5
# 使用ORM

# 先找出獎金大於750元的employee
gte_750_bonus_employee_ids = EmployeeBonus.objects \ 
    .filter(bonus__gte=750) \
    .values_list('employee_id', flat=True)

# 再透過exclude(id__in=gte_750_bonus_employee_ids)排除大於750元員工
# 剩下的employee就是沒有獎金或低於750元
queryset = Employee.objects.exclude(id__in=gte_750_bonus_employee_ids)

for employee in queryset:
    print(f'name: {employee.name}, bonus: {[int(item.bonus) for item in employee.bonus.all()]}')

# Print result:
# name: John, bonus: []
# name: Dan, bonus: [500]
# name: Brad, bonus: []

解法二

# Employee test_case_1_2
# python manage.py test_uinterview --case 6
# 使用SQL

with connection.cursor() as cursor:
    cursor.execute(
        f"""
        SELECT U0.id, 
               U0.name, 
               U0.sex, 
               COALESCE(
                    (SELECT SUM(U2.bonus) 
                     FROM uinterview_employeebonus U2
                     WHERE U2.employee_id = U0.id), 0) AS bonus
        FROM uinterview_employee U0
        WHERE NOT (U0.id IN (
                        SELECT U1.employee_id 
                        FROM uinterview_employeebonus U1 
                        WHERE U1.bonus >= 750))
        ORDER BY bonus, U0.name
    """
    )

    utils.print_cursor(cursor)

# Print result:
#   id  name    sex      bonus
# ----  ------  -----  -------
#  111  Brad    M            0
#  109  John    M            0
#  110  Dan     F          500

題目四

Question

員工表格中性別輸入錯誤，請寫一條SQL更新語法，將表格中sex為’M’資料更新為’F’，sex為’F’資料更新為’M’，更新得值請寫回sex2。

==================== Employee Table ====================
  id  name    sex    sex2
----  ------  -----  ------
 109  John    M
 110  Dan     F
 111  Brad    M
 112  Tomas   F

Answer

sex2為題目答案。
==================== Employee Table ====================
  id  name    sex    sex2
----  ------  -----  ------
 109  John    M      F
 110  Dan     F      M
 111  Brad    M      F
 112  Tomas   F      M

解法一

# Employee test_case_2
# python manage.py test_uinterview --case 7
# 使用ORM

# 透過CASE WHEN THEN END更新sex2
Employee.objects.all().update(
    sex2=Case(When(sex='M', then=Value('F')), default=Value('M'))
)

utils.print_queryset(Employee.objects.all())

# Print result:
# ==================== Employee Table ====================
#   id  name    sex    sex2
# ----  ------  -----  ------
#  109  John    M      F
#  110  Dan     F      M
#  111  Brad    M      F
#  112  Tomas   F      M

解法二

# Employee test_case_2_2
# python manage.py test_uinterview --case 8
# 使用SQL

with connection.cursor() as cursor:
    cursor.execute(
        f"""
       UPDATE uinterview_employee 
       SET sex2 = CASE 
                    WHEN (sex = 'M') THEN 'F' 
                    ELSE 'M' 
                  END
       RETURNING *;
    """
    )

    utils.print_cursor(cursor)

# Print result:
#  id  name    sex    sex2
# ----  ------  -----  ------
# 109  John    M      F
# 110  Dan     F      M
# 111  Brad    M      F
# 112  Tomas   F      M

題目五

Question

假設今天是2019-6-23，找出表格中找出過去一年内銷售數量少於10的產品id和名稱，並排除available_date小於一個月的產品。

=============== Product Table ===============
  id  name      available_date
----  --------  -------------------
   1  Product1  2010-01-01 00:00:00
   2  Product2  2012-05-12 00:00:00
   3  Product3  2019-06-10 00:00:00
   4  Product4  2019-06-01 00:00:00
   5  Product5  2008-09-21 00:00:00
==================== Order Table ====================
  id    product    qty  dispatch_date
----  ---------  -----  -------------------
   1          1      4  2018-07-26 00:00:00
   2          1      2  2018-11-05 00:00:00
   3          3      1  2019-06-11 00:00:00
   4          4      8  2019-06-05 00:00:00
   5          4      6  2019-06-20 00:00:00
   6          5      5  2009-02-02 00:00:00
   7          5      9  2010-04-13 00:00:00

Answer

1
2
3

  id  name      available_date         total_of_qty
----  --------  -------------------  --------------
   3  Product3  2019-06-10 00:00:00               1

解法一

# Product test_case_1
# python manage.py test_uinterview --case 9
# 使用ORM

now = datetime.strptime("2019-6-23", "%Y-%m-%d")

# 先使用Subquery先統計各產品一年內的銷售總數total_of_qty
order_queryset = Order.objects \
    .filter(product=OuterRef('pk')) \
    .filter(dispatch_date__gte=now - timedelta(days=365)) \
    .annotate(total_of_qty=Coalesce(Func('qty', function='Sum'), 0)) \
    .values('total_of_qty')

# 再把total_of_qty設定給產品
queryset = Product.objects \
    .annotate(qty_total=Subquery(order_queryset)) \
    .filter(total_of_qty__lt=10) \
    .filter(available_date__gte=now - timedelta(days=30))

utils.print_queryset(queryset, ['total_of_qty'])

# Print result:
#   id  name      available_date         total_of_qty
# ----  --------  -------------------  --------------
#    3  Product3  2019-06-10 00:00:00               1

解法二

# Product test_case_1_2_2
# python manage.py test_uinterview --case 10
# 使用SQL

with connection.cursor() as cursor:
    cursor.execute(
        f"""
        SELECT id, 
               name, 
               available_date, 
               (SELECT COALESCE(Sum(U1.qty), 0) 
                FROM uinterview_order U1 
                WHERE (U1.product_id = U0.id 
                      AND dispatch_date >= '2018-06-23T00:00:00'::timestamp)) AS total_of_qty 
        FROM uinterview_product U0
        WHERE ((SELECT COALESCE(Sum(U2.qty), 0) 
                FROM uinterview_order U2
                WHERE (U2.product_id = U0.id AND U2.dispatch_date >= '2018-06-23T00:00:00'::timestamp)) < 10 
                
                AND U0.available_date >= '2019-05-24T00:00:00'::timestamp) 
    """
    )

    utils.print_cursor(cursor)

# Print result:
#   id  name      available_date         total_of_qty
# ----  --------  -------------------  --------------
#    3  Product3  2019-06-10 00:00:00               1

題目六

Question

找出表格中所有部門的學生人數，並按學生人數由大到小、部門名稱由小到大排序。

========== Department Table ==========
  id  name
----  -----------
   1  Engineering
   2  Science
   3  Law
==================== Student Table ====================
  id    department  name    gender
----  ------------  ------  --------
   1             1  Jack    M
   2             1  Jan     F
   3             2  Mark    M

Answer

  id  name           total_of_students
----  -----------  -------------------
   1  Engineering                    2
   2  Science                        1
   3  Law                            0

解法一

# Department test_case_1
# python manage.py test_uinterview --case 11
# 使用ORM

student_queryset = Student.objects \
    .filter(department=OuterRef('pk')) \
    .annotate(total_of_students=Coalesce(Func('id', function='Count'), 0)) \
    .values('total_of_students')

department_queryset = Department.objects \
    .annotate(total_of_students=Subquery(student_queryset)) \
    .order_by('-total_of_students', 'name')

utils.print_queryset(department_queryset, ['total_of_students'])
# for department in department_queryset:
#     department_name = str(department.name).ljust(12, " ")
#     department_total_of_students = str(department.total_of_students).rjust(2, " ")
#     print(f'人數: {department_total_of_students}, 部門: {department_name}')

# Print result:
# =============== Department Table ===============
#   id  name           total_of_students
# ----  -----------  -------------------
#    1  Engineering                    2
#    2  Science                        1
#    3  Law                            0

解法二

# Department test_case_1_2
# python manage.py test_uinterview --case 12
# 使用SQL

with connection.cursor() as cursor:
    cursor.execute(
        f"""
        SELECT id, 
               name, 
               (SELECT COALESCE(Count(*), 0)
                FROM uinterview_student U1
                WHERE U1.department_id = U0.id) AS total_of_students 
        FROM uinterview_department U0 
        ORDER BY total_of_students DESC, name ASC;
    """
    )

    utils.print_cursor(cursor)

# Print result:
#   id  name           total_of_students
# ----  -----------  -------------------
#    1  Engineering                    2
#    2  Science                        1
#    3  Law                            0

題目七

Question

找出表格中評論最多部電影的用戶，如果評論數相同，依用户名稱由小到大排序。

========== Movie Table ==========
  id  name
----  --------
   1  Avengers
   2  Frozen 2
   3  Joker
========== User Table ==========
  id  name
----  ------
   1  Daniel
   2  Monica
   3  Maria
   4  James
========================= Movie Rating Table =========================
  id    movie    user    rating  create_date
----  -------  ------  --------  -------------
   1        1       1         3  2020-01-12
   2        1       2         4  2020-02-11
   3        1       3         2  2020-02-12
   4        1       4         1  2020-01-01
   5        2       1         5  2020-02-17
   6        2       2         2  2020-02-01
   7        2       3         2  2020-03-01
   8        3       1         3  2020-02-22
   9        3       2         4  2020-02-25

Answer

  id  name      total_of_comment  is_top
----  ------  ------------------  --------
   1  Daniel                   3  True
   2  Monica                   3  True
   3  Maria                    2
   4  James                    1

解法一

# Movie test_case_1
# python manage.py test_uinterview --case 13
# 使用ORM

# 先用Subquery統計各別用戶的發表評論數total_of_comment
movie_comment_queryset = MovieRating.objects \
    .filter(user=OuterRef('pk')) \
    .annotate(total_of_comment=Func('id', function='Count')) \
    .values('total_of_comment')

# 將total_of_comment設定給用戶
user_queryset = User.objects \
    .annotate(total_of_comment=Subquery(movie_comment_queryset)) \
    .order_by('-total_of_comment', 'name')

# 用第一紀錄比對total_of_comment，如果相同則設定is_top=True
user_queryset = user_queryset \
    .annotate(is_top= \
         Case(When(total_of_comment=user_queryset.first().total_of_comment, then=True), )
    )

utils.print_queryset(user_queryset, ['total_of_comment', 'is_top'])

# Print result:
# ==================== User Table ====================
#   id  name      total_of_comment  is_top
# ----  ------  ------------------  --------
#    1  Daniel                   3  True
#    2  Monica                   3  True
#    3  Maria                    2
#    4  James                    1

解法二

# Movie test_case_1_2
# python manage.py test_uinterview --case 14
# 使用SQL

with connection.cursor() as cursor:
    cursor.execute(
        f"""
         -- 透過GROUP BY user_id後在進行COUNT(*)，可以得到每個user的評論次數
         SELECT * 
         FROM (SELECT user_id,
                       (SELECT name FROM uinterview_user WHERE id = user_id) AS user_name,
                       COUNT(*) AS total_of_comment
               FROM uinterview_movierating
               GROUP BY user_id
               ORDER BY total_of_comment DESC, user_name ASC) U0
         -- 只顯示最多評論次數的用戶
         WHERE total_of_comment = ( SELECT MAX(total_of_comment) 
                                     FROM (SELECT user_id,COUNT(*) AS total_of_comment
                                            FROM uinterview_movierating
                                            GROUP BY user_id) U1
                                  )
        """
    )

    utils.print_cursor(cursor)

# Print result:
#   user_id  user_name      total_of_comment
# ---------  -----------  ------------------
#         1  Daniel                        3
#         2  Monica                        3

題目八

Question

並找出2020/2月平均評分最高的電影名稱，如果評分相同，依電影名稱由小到大排序。

========== Movie Table ==========
  id  name
----  --------
   1  Avengers
   2  Frozen 2
   3  Joker
========== User Table ==========
  id  name
----  ------
   1  Daniel
   2  Monica
   3  Maria
   4  James
========================= Movie Rating Table =========================
  id    movie    user    rating  create_date
----  -------  ------  --------  -------------
   1        1       1         3  2020-01-12
   2        1       2         4  2020-02-11
   3        1       3         2  2020-02-12
   4        1       4         1  2020-01-01
   5        2       1         5  2020-02-17
   6        2       2         2  2020-02-01
   7        2       3         2  2020-03-01
   8        3       1         3  2020-02-22
   9        3       2         4  2020-02-25

Answer

  id  name        total_of_rating  is_top
----  --------  -----------------  --------
   2  Frozen 2                  7  True
   3  Joker                     7  True
   1  Avengers                  6

解法一

# Movie test_case_1
# python manage.py test_uinterview --case 15
# 使用ORM

movie_rating_queryset = MovieRating.objects
    .filter(movie=OuterRef('pk'))
    .filter(create_date__year=2020, create_date__month=2)
    .annotate(total_of_rating=Func('rating', function='Sum'))
    .values('total_of_rating')

movie_queryset = Movie.objects
    .annotate(total_of_rating=Subquery(movie_rating_queryset))
    .order_by('-total_of_rating', 'name')

movie_queryset = movie_queryset
    .annotate(is_most=
                  Case(When(total_of_rating=movie_queryset.first().total_of_rating, then=True), )
              )

utils.print_queryset(movie_queryset, ['total_of_rating', 'is_most'])

# Print result:
# ==================== Movie Table ====================
#   id  name        total_of_rating  is_top
# ----  --------  -----------------  --------
#    2  Frozen 2                  7  True
#    3  Joker                     7  True
#    1  Avengers                  6

解法二

# Movie test_case_1_2
# python manage.py test_uinterview --case 16
# 使用SQL

with connection.cursor() as cursor:
    cursor.execute(
        f"""
         -- 透過GROUP BY movie_id後在進行SUM(rating)，可以得到每個movie的評分總和
         SELECT * 
         FROM (SELECT movie_id,
                      (SELECT name 
                        FROM uinterview_movie 
                        WHERE id = movie_id) AS movie_name,
                      SUM(rating) AS total_of_rating
               FROM uinterview_movierating
               WHERE create_date BETWEEN '2020-02-01' AND '2020-02-29'
               GROUP BY movie_id
               ORDER BY total_of_rating DESC, movie_name ASC) U0
         -- 只顯示最高評分電影
         WHERE total_of_rating = ( SELECT MAX(total_of_rating) 
                                   FROM (SELECT movie_id, SUM(rating) AS total_of_rating
                                           FROM uinterview_movierating
                                           WHERE create_date BETWEEN '2020-02-01' AND '2020-02-29'
                                           GROUP BY movie_id) U1
                                  )
        """
    )

    utils.print_cursor(cursor)

# Print result:
#   movie_id  movie_name      total_of_rating
# ----------  ------------  -----------------
#          2  Frozen 2                      7
#          3  Joker                         7

題目九

Question

找出表格中連續4天或以上登入的用户帳號的id和名稱，並按id由小到大排序。
(使用OVER 與 PARTITION BY語法來完成)

========== Account Table ==========
  id  name
----  --------
   1  Wick
   7  Jonathan
=============== Account Login Table ===============
  id    account  login_date
----  ---------  ------------
   2          7  2020-05-30
   3          1  2020-05-30
   4          7  2020-05-31
   5          7  2020-06-01
   6          7  2020-06-02
   7          7  2020-06-02
   8          7  2020-06-03
   9          1  2020-06-07
  10          7  2020-06-10

Answer

  account_id  name      login_days    first_login_at
------------  --------  ------------  ----------------
           7  Jonathan  5天           2020-05-30

解法一

# Account test_case_1
# python manage.py test_uinterview --case 17
# 使用SQL

consecutive_login_days = 4
with connection.cursor() as cursor:
    cursor.execute(
        f"""
      SELECT account_id, 
          (SELECT name FROM uinterview_account WHERE id = account_id) AS name,
          CONCAT(count(sub_date),'天') AS login_days,
          first_login_at 
      FROM (
           SELECT account_id,
                  group_seq,
                  TO_CHAR(login_date,'YYYY-mm-dd') AS login_date,
                  TO_CHAR(sub_date,'YYYY-mm-dd') AS sub_date,
                  TO_CHAR(
                  CASE 
                       WHEN group_seq = 1 THEN login_date
                       WHEN sub_date = LAG(sub_date,1) OVER(PARTITION BY account_id ORDER BY login_date) 
                            THEN LAG(login_date,CAST(group_seq AS INT)-1) OVER()
                       ELSE login_date 
                       END
                  ,'YYYY-mm-dd') AS first_login_at
           FROM (
                  -- 如有連續相同sub_date代表是連續登入日期
                  -- 例如:
                  --   login_date(2022-01-02) - group_seq(1) = 2022-01-01
                  --   login_date(2022-01-03) - group_seq(2) = 2022-01-01
                  --   login_date(2022-01-04) - group_seq(3) = 2022-01-01
                  --   login_date(2022-01-10) - group_seq(4) = 2022-01-06
                  SELECT *,
                         login_date - (group_seq || 'DAY')::INTERVAL AS sub_date
                  FROM(
                      -- 以group_seq來分組，後續login_date - group_seq天來判斷是否為連續登入
                      SELECT *, 
                             ROW_NUMBER() OVER(PARTITION BY account_id ORDER BY login_date) AS group_seq
                      FROM (
                              -- 過濾掉同一天多次登入日期，只保留一筆資料
                              SELECT DISTINCT account_id ,login_date
                              FROM uinterview_accountlogin
                              ORDER BY login_date
                      ) U0
                  ) U1
           ) U2
      ) U3 
      GROUP BY account_id,sub_date,first_login_at  
      HAVING count(*) >= {consecutive_login_days}
      ORDER BY account_id,first_login_at;
      """
    )

    utils.print_cursor(cursor)

# Print result:
#  account_id  name      login_days    first_login_at
# ------------  --------  ------------  ----------------
#           7  Jonathan  5天           2020-05-30

Keyword

1 2	演算法, Algorithm, al-go-rithm

後端工程師面試-資料庫篇

前言

原始碼

指令

題目一

Question

Answer

解法一

解法二

題目二

Question

Answer

解法一

解法二

題目三

Question

Answer

解法一

解法二

題目四

Question

Answer

解法一

解法二

題目五

Question

Answer

解法一

解法二

題目六

Question

Answer

解法一

解法二

題目七

Question

Answer

解法一

解法二

題目八

Question

Answer

解法一

解法二

題目九

Question

Answer

解法一

Keyword