前言

前陣子進行環球誠信後端工程師面試,試題為資料庫操作各種問題,過程中發現滿多沒用過的語法以及過去比較少用SQL處理複雜的事務,事後再針對考題進行複盤練習。

原始碼

在phantom_mask專案的Django新增一個uinterview應用,實現考題上所有需要的測試資料以及各項考題單元測試。
Github code

指令

  • 匯入測試資料python manage.py insert_uinterview_test_data
  • 所有試題測試輸出python manage.py test_uinterview
  • 單一試題測試輸出pymanage test_uinterview --show_sql 1 --case 1
    • show_sql: 是否顯示已執行SQL,預設為0
    • case: 測試試題編號,範圍為1~17

題目一

Question

如果客户的deliver_date和order_date相同,則該訂單為real-time order,
否則都為plan order 查 詢表格中real-time order所佔的百分比,四捨五入到小數點後2位。

1
2
3
4
5
6
7
8
9
==================== Delivery Table ====================
id customer_id order_date delivery_date
---- ------------- ------------ ---------------
208 1 2019-08-01 2019-08-02
209 5 2019-08-02 2019-08-02
210 1 2019-08-11 2019-08-11
211 3 2019-08-24 2019-08-26
212 4 2019-08-21 2019-08-22
213 2 2019-08-11 2019-08-13

Answer

1
{'real_time_order_percent': 33.33}

解法一

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Delivery test_case_1
# python manage.py test_uinterview --case 1
# 使用ORM

# 先取得real_time_order的數量
# 使用F()進行其他欄位比較
real_order_count = Delivery.objects \
.filter(delivery_date=F('order_date')) \
.count()

# 透過aggregate()函數進行聚合計算總筆數
# 需*100.0,否則會被當成整數
# 使用Cast(value,type)函數將數值轉換為float
queryset = Delivery.objects.aggregate( \
real_time_order_percent=Cast(
Round(real_order_count * 100.0 / Count('*')),
FloatField()
)
)

print(queryset)

# Print result:
# {'real_time_order_percent': 33.33}

解法二

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Delivery test_case_1_2
# python manage.py test_uinterview --case 2
# 使用SQL

with connection.cursor() as cursor:
cursor.execute(
f"""
SELECT CAST(
ROUND(
(SELECT COUNT(*)
FROM uinterview_delivery
-- 需*100.0,否則會被當成int
WHERE uinterview_delivery.delivery_date = uinterview_delivery.order_date) * 100.0 / COUNT(*)
, 2) AS double precision) AS real_time_order_percent
FROM uinterview_delivery
"""
)

utils.print_cursor(cursor)

# Print result:
# real_time_order_percent
# -------------------------
# 33.33

題目二

Question

表格中有重複email的資料,當資料中有重複的email時將is_delete標記為true,保留Id最小的資料。

1
2
3
4
5
6
=============== Email Table ===============
id email is_delete
---- ----------------- -----------
96 alan@gexample.com False
95 bob@gexample.com False
94 alan@gexample.com False

Answer

1
2
3
4
5
6
=============== Email Table ===============
id email is_delete
---- ----------------- -----------
94 alan@gexample.com False
95 bob@gexample.com False
96 alan@gexample.com True

解法一

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Email test_case_1
# python manage.py test_uinterview --case 3
# 使用ORM

# 透過distinct('email')找出不重覆Email的id,並以email,id進行降序排序
keep_email_ids = Email.objects \
.values_list('id', flat=True) \
.distinct('email') \
.order_by('email', 'id')

# 再透過exclude(id__in=keep_email_ids)排除不重覆的id,將is_delete標記為true
Email.objects.exclude(id__in=keep_email_ids).update(is_delete=True)

utils.print_queryset(Email.objects.all().order_by('id'))

# Print result:
# =============== Email Table ===============
# id email is_delete
# ---- ----------------- -----------
# 94 alan@gexample.com False
# 95 bob@gexample.com False
# 96 alan@gexample.com True


解法二

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Email test_case_1_2
# python manage.py test_uinterview --case 4
# 使用SQL

with connection.cursor() as cursor:
cursor.execute(
f"""
UPDATE uinterview_email
SET is_delete = true
WHERE NOT (id IN (
SELECT DISTINCT ON (email) id
FROM uinterview_email
ORDER BY email ASC, id ASC)
)
RETURNING *;
"""
)

utils.print_cursor(cursor)

# Print result:
# id email is_delete
# ---- ----------------- -----------
# 96 alan@gexample.com True


題目三

Question

找出表格中員工的獎金小於750元或沒有領到獎金的員工姓名和獎金。

1
2
3
4
5
6
7
8
9
10
11
12
==================== Employee Table ====================
id name sex sex2
---- ------ ----- ------
109 John M
110 Dan F
111 Brad M
112 Tomas F
=============== Employee Bonus Table ===============
id employee bonus
---- ---------- -------
55 110 500
56 112 2000

Answer

1
2
3
name: John, bonus: []
name: Dan, bonus: [500]
name: Brad, bonus: []

解法一

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Employee test_case_1
# python manage.py test_uinterview --case 5
# 使用ORM

# 先找出獎金大於750元的employee
gte_750_bonus_employee_ids = EmployeeBonus.objects \
.filter(bonus__gte=750) \
.values_list('employee_id', flat=True)

# 再透過exclude(id__in=gte_750_bonus_employee_ids)排除大於750元員工
# 剩下的employee就是沒有獎金或低於750元
queryset = Employee.objects.exclude(id__in=gte_750_bonus_employee_ids)

for employee in queryset:
print(f'name: {employee.name}, bonus: {[int(item.bonus) for item in employee.bonus.all()]}')

# Print result:
# name: John, bonus: []
# name: Dan, bonus: [500]
# name: Brad, bonus: []

解法二

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Employee test_case_1_2
# python manage.py test_uinterview --case 6
# 使用SQL

with connection.cursor() as cursor:
cursor.execute(
f"""
SELECT U0.id,
U0.name,
U0.sex,
COALESCE(
(SELECT SUM(U2.bonus)
FROM uinterview_employeebonus U2
WHERE U2.employee_id = U0.id), 0) AS bonus
FROM uinterview_employee U0
WHERE NOT (U0.id IN (
SELECT U1.employee_id
FROM uinterview_employeebonus U1
WHERE U1.bonus >= 750))
ORDER BY bonus, U0.name
"""
)

utils.print_cursor(cursor)

# Print result:
# id name sex bonus
# ---- ------ ----- -------
# 111 Brad M 0
# 109 John M 0
# 110 Dan F 500


題目四

Question

員工表格中性別輸入錯誤,請寫一條SQL更新語法,將表格中sex為’M’資料更新為’F’,sex為’F’資料更新為’M’,更新得值請寫回sex2。

1
2
3
4
5
6
7
8
==================== Employee Table ====================
id name sex sex2
---- ------ ----- ------
109 John M
110 Dan F
111 Brad M
112 Tomas F

Answer

1
2
3
4
5
6
7
8
9
sex2為題目答案。
==================== Employee Table ====================
id name sex sex2
---- ------ ----- ------
109 John M F
110 Dan F M
111 Brad M F
112 Tomas F M

解法一

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Employee test_case_2
# python manage.py test_uinterview --case 7
# 使用ORM

# 透過CASE WHEN THEN END更新sex2
Employee.objects.all().update(
sex2=Case(When(sex='M', then=Value('F')), default=Value('M'))
)

utils.print_queryset(Employee.objects.all())

# Print result:
# ==================== Employee Table ====================
# id name sex sex2
# ---- ------ ----- ------
# 109 John M F
# 110 Dan F M
# 111 Brad M F
# 112 Tomas F M

解法二

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Employee test_case_2_2
# python manage.py test_uinterview --case 8
# 使用SQL

with connection.cursor() as cursor:
cursor.execute(
f"""
UPDATE uinterview_employee
SET sex2 = CASE
WHEN (sex = 'M') THEN 'F'
ELSE 'M'
END
RETURNING *;
"""
)

utils.print_cursor(cursor)

# Print result:
# id name sex sex2
# ---- ------ ----- ------
# 109 John M F
# 110 Dan F M
# 111 Brad M F
# 112 Tomas F M

題目五

Question

假設今天是2019-6-23,找出表格中找出過去一年内銷售數量少於10的產品id和名稱,並排除available_date小於一個月的產品。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
=============== Product Table ===============
id name available_date
---- -------- -------------------
1 Product1 2010-01-01 00:00:00
2 Product2 2012-05-12 00:00:00
3 Product3 2019-06-10 00:00:00
4 Product4 2019-06-01 00:00:00
5 Product5 2008-09-21 00:00:00
==================== Order Table ====================
id product qty dispatch_date
---- --------- ----- -------------------
1 1 4 2018-07-26 00:00:00
2 1 2 2018-11-05 00:00:00
3 3 1 2019-06-11 00:00:00
4 4 8 2019-06-05 00:00:00
5 4 6 2019-06-20 00:00:00
6 5 5 2009-02-02 00:00:00
7 5 9 2010-04-13 00:00:00

Answer

1
2
3
  id  name      available_date         total_of_qty
---- -------- ------------------- --------------
3 Product3 2019-06-10 00:00:00 1

解法一

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Product test_case_1
# python manage.py test_uinterview --case 9
# 使用ORM

now = datetime.strptime("2019-6-23", "%Y-%m-%d")

# 先使用Subquery先統計各產品一年內的銷售總數total_of_qty
order_queryset = Order.objects \
.filter(product=OuterRef('pk')) \
.filter(dispatch_date__gte=now - timedelta(days=365)) \
.annotate(total_of_qty=Coalesce(Func('qty', function='Sum'), 0)) \
.values('total_of_qty')

# 再把total_of_qty設定給產品
queryset = Product.objects \
.annotate(qty_total=Subquery(order_queryset)) \
.filter(total_of_qty__lt=10) \
.filter(available_date__gte=now - timedelta(days=30))

utils.print_queryset(queryset, ['total_of_qty'])

# Print result:
# id name available_date total_of_qty
# ---- -------- ------------------- --------------
# 3 Product3 2019-06-10 00:00:00 1


解法二

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Product test_case_1_2_2
# python manage.py test_uinterview --case 10
# 使用SQL

with connection.cursor() as cursor:
cursor.execute(
f"""
SELECT id,
name,
available_date,
(SELECT COALESCE(Sum(U1.qty), 0)
FROM uinterview_order U1
WHERE (U1.product_id = U0.id
AND dispatch_date >= '2018-06-23T00:00:00'::timestamp)) AS total_of_qty
FROM uinterview_product U0
WHERE ((SELECT COALESCE(Sum(U2.qty), 0)
FROM uinterview_order U2
WHERE (U2.product_id = U0.id AND U2.dispatch_date >= '2018-06-23T00:00:00'::timestamp)) < 10

AND U0.available_date >= '2019-05-24T00:00:00'::timestamp)
"""
)

utils.print_cursor(cursor)

# Print result:
# id name available_date total_of_qty
# ---- -------- ------------------- --------------
# 3 Product3 2019-06-10 00:00:00 1

題目六

Question

找出表格中所有部門的學生人數,並按學生人數由大到小、部門名稱由小到大排序。

1
2
3
4
5
6
7
8
9
10
11
12
========== Department Table ==========
id name
---- -----------
1 Engineering
2 Science
3 Law
==================== Student Table ====================
id department name gender
---- ------------ ------ --------
1 1 Jack M
2 1 Jan F
3 2 Mark M

Answer

1
2
3
4
5
  id  name           total_of_students
---- ----------- -------------------
1 Engineering 2
2 Science 1
3 Law 0

解法一

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Department test_case_1
# python manage.py test_uinterview --case 11
# 使用ORM

student_queryset = Student.objects \
.filter(department=OuterRef('pk')) \
.annotate(total_of_students=Coalesce(Func('id', function='Count'), 0)) \
.values('total_of_students')

department_queryset = Department.objects \
.annotate(total_of_students=Subquery(student_queryset)) \
.order_by('-total_of_students', 'name')

utils.print_queryset(department_queryset, ['total_of_students'])
# for department in department_queryset:
# department_name = str(department.name).ljust(12, " ")
# department_total_of_students = str(department.total_of_students).rjust(2, " ")
# print(f'人數: {department_total_of_students}, 部門: {department_name}')

# Print result:
# =============== Department Table ===============
# id name total_of_students
# ---- ----------- -------------------
# 1 Engineering 2
# 2 Science 1
# 3 Law 0

解法二

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Department test_case_1_2
# python manage.py test_uinterview --case 12
# 使用SQL

with connection.cursor() as cursor:
cursor.execute(
f"""
SELECT id,
name,
(SELECT COALESCE(Count(*), 0)
FROM uinterview_student U1
WHERE U1.department_id = U0.id) AS total_of_students
FROM uinterview_department U0
ORDER BY total_of_students DESC, name ASC;
"""
)

utils.print_cursor(cursor)

# Print result:
# id name total_of_students
# ---- ----------- -------------------
# 1 Engineering 2
# 2 Science 1
# 3 Law 0

題目七

Question

找出表格中評論最多部電影的用戶,如果評論數相同,依用户名稱由小到大排序。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
========== Movie Table ==========
id name
---- --------
1 Avengers
2 Frozen 2
3 Joker
========== User Table ==========
id name
---- ------
1 Daniel
2 Monica
3 Maria
4 James
========================= Movie Rating Table =========================
id movie user rating create_date
---- ------- ------ -------- -------------
1 1 1 3 2020-01-12
2 1 2 4 2020-02-11
3 1 3 2 2020-02-12
4 1 4 1 2020-01-01
5 2 1 5 2020-02-17
6 2 2 2 2020-02-01
7 2 3 2 2020-03-01
8 3 1 3 2020-02-22
9 3 2 4 2020-02-25

Answer

1
2
3
4
5
6
7
  id  name      total_of_comment  is_top
---- ------ ------------------ --------
1 Daniel 3 True
2 Monica 3 True
3 Maria 2
4 James 1

解法一

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# Movie test_case_1
# python manage.py test_uinterview --case 13
# 使用ORM

# 先用Subquery統計各別用戶的發表評論數total_of_comment
movie_comment_queryset = MovieRating.objects \
.filter(user=OuterRef('pk')) \
.annotate(total_of_comment=Func('id', function='Count')) \
.values('total_of_comment')

# 將total_of_comment設定給用戶
user_queryset = User.objects \
.annotate(total_of_comment=Subquery(movie_comment_queryset)) \
.order_by('-total_of_comment', 'name')

# 用第一紀錄比對total_of_comment,如果相同則設定is_top=True
user_queryset = user_queryset \
.annotate(is_top= \
Case(When(total_of_comment=user_queryset.first().total_of_comment, then=True), )
)

utils.print_queryset(user_queryset, ['total_of_comment', 'is_top'])

# Print result:
# ==================== User Table ====================
# id name total_of_comment is_top
# ---- ------ ------------------ --------
# 1 Daniel 3 True
# 2 Monica 3 True
# 3 Maria 2
# 4 James 1

解法二

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# Movie test_case_1_2
# python manage.py test_uinterview --case 14
# 使用SQL

with connection.cursor() as cursor:
cursor.execute(
f"""
-- 透過GROUP BY user_id後在進行COUNT(*),可以得到每個user的評論次數
SELECT *
FROM (SELECT user_id,
(SELECT name FROM uinterview_user WHERE id = user_id) AS user_name,
COUNT(*) AS total_of_comment
FROM uinterview_movierating
GROUP BY user_id
ORDER BY total_of_comment DESC, user_name ASC) U0
-- 只顯示最多評論次數的用戶
WHERE total_of_comment = ( SELECT MAX(total_of_comment)
FROM (SELECT user_id,COUNT(*) AS total_of_comment
FROM uinterview_movierating
GROUP BY user_id) U1
)
"""
)

utils.print_cursor(cursor)

# Print result:
# user_id user_name total_of_comment
# --------- ----------- ------------------
# 1 Daniel 3
# 2 Monica 3

題目八

Question

並找出2020/2月平均評分最高的電影名稱,如果評分相同,依電影名稱由小到大排序。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
========== Movie Table ==========
id name
---- --------
1 Avengers
2 Frozen 2
3 Joker
========== User Table ==========
id name
---- ------
1 Daniel
2 Monica
3 Maria
4 James
========================= Movie Rating Table =========================
id movie user rating create_date
---- ------- ------ -------- -------------
1 1 1 3 2020-01-12
2 1 2 4 2020-02-11
3 1 3 2 2020-02-12
4 1 4 1 2020-01-01
5 2 1 5 2020-02-17
6 2 2 2 2020-02-01
7 2 3 2 2020-03-01
8 3 1 3 2020-02-22
9 3 2 4 2020-02-25

Answer

1
2
3
4
5
  id  name        total_of_rating  is_top
---- -------- ----------------- --------
2 Frozen 2 7 True
3 Joker 7 True
1 Avengers 6

解法一

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# Movie test_case_1
# python manage.py test_uinterview --case 15
# 使用ORM

movie_rating_queryset = MovieRating.objects
.filter(movie=OuterRef('pk'))
.filter(create_date__year=2020, create_date__month=2)
.annotate(total_of_rating=Func('rating', function='Sum'))
.values('total_of_rating')

movie_queryset = Movie.objects
.annotate(total_of_rating=Subquery(movie_rating_queryset))
.order_by('-total_of_rating', 'name')

movie_queryset = movie_queryset
.annotate(is_most=
Case(When(total_of_rating=movie_queryset.first().total_of_rating, then=True), )
)

utils.print_queryset(movie_queryset, ['total_of_rating', 'is_most'])

# Print result:
# ==================== Movie Table ====================
# id name total_of_rating is_top
# ---- -------- ----------------- --------
# 2 Frozen 2 7 True
# 3 Joker 7 True
# 1 Avengers 6

解法二

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# Movie test_case_1_2
# python manage.py test_uinterview --case 16
# 使用SQL

with connection.cursor() as cursor:
cursor.execute(
f"""
-- 透過GROUP BY movie_id後在進行SUM(rating),可以得到每個movie的評分總和
SELECT *
FROM (SELECT movie_id,
(SELECT name
FROM uinterview_movie
WHERE id = movie_id) AS movie_name,
SUM(rating) AS total_of_rating
FROM uinterview_movierating
WHERE create_date BETWEEN '2020-02-01' AND '2020-02-29'
GROUP BY movie_id
ORDER BY total_of_rating DESC, movie_name ASC) U0
-- 只顯示最高評分電影
WHERE total_of_rating = ( SELECT MAX(total_of_rating)
FROM (SELECT movie_id, SUM(rating) AS total_of_rating
FROM uinterview_movierating
WHERE create_date BETWEEN '2020-02-01' AND '2020-02-29'
GROUP BY movie_id) U1
)
"""
)

utils.print_cursor(cursor)

# Print result:
# movie_id movie_name total_of_rating
# ---------- ------------ -----------------
# 2 Frozen 2 7
# 3 Joker 7

題目九

Question

找出表格中連續4天或以上登入的用户帳號的id和名稱,並按id由小到大排序。
(使用OVER 與 PARTITION BY語法來完成)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
========== Account Table ==========
id name
---- --------
1 Wick
7 Jonathan
=============== Account Login Table ===============
id account login_date
---- --------- ------------
2 7 2020-05-30
3 1 2020-05-30
4 7 2020-05-31
5 7 2020-06-01
6 7 2020-06-02
7 7 2020-06-02
8 7 2020-06-03
9 1 2020-06-07
10 7 2020-06-10

Answer

1
2
3
4
  account_id  name      login_days    first_login_at
------------ -------- ------------ ----------------
7 Jonathan 5天 2020-05-30

解法一

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
# Account test_case_1
# python manage.py test_uinterview --case 17
# 使用SQL

consecutive_login_days = 4
with connection.cursor() as cursor:
cursor.execute(
f"""
SELECT account_id,
(SELECT name FROM uinterview_account WHERE id = account_id) AS name,
CONCAT(count(sub_date),'天') AS login_days,
first_login_at
FROM (
SELECT account_id,
group_seq,
TO_CHAR(login_date,'YYYY-mm-dd') AS login_date,
TO_CHAR(sub_date,'YYYY-mm-dd') AS sub_date,
TO_CHAR(
CASE
WHEN group_seq = 1 THEN login_date
WHEN sub_date = LAG(sub_date,1) OVER(PARTITION BY account_id ORDER BY login_date)
THEN LAG(login_date,CAST(group_seq AS INT)-1) OVER()
ELSE login_date
END
,'YYYY-mm-dd') AS first_login_at
FROM (
-- 如有連續相同sub_date代表是連續登入日期
-- 例如:
-- login_date(2022-01-02) - group_seq(1) = 2022-01-01
-- login_date(2022-01-03) - group_seq(2) = 2022-01-01
-- login_date(2022-01-04) - group_seq(3) = 2022-01-01
-- login_date(2022-01-10) - group_seq(4) = 2022-01-06
SELECT *,
login_date - (group_seq || 'DAY')::INTERVAL AS sub_date
FROM(
-- 以group_seq來分組,後續login_date - group_seq天來判斷是否為連續登入
SELECT *,
ROW_NUMBER() OVER(PARTITION BY account_id ORDER BY login_date) AS group_seq
FROM (
-- 過濾掉同一天多次登入日期,只保留一筆資料
SELECT DISTINCT account_id ,login_date
FROM uinterview_accountlogin
ORDER BY login_date
) U0
) U1
) U2
) U3
GROUP BY account_id,sub_date,first_login_at
HAVING count(*) >= {consecutive_login_days}
ORDER BY account_id,first_login_at;
"""
)

utils.print_cursor(cursor)

# Print result:
# account_id name login_days first_login_at
# ------------ -------- ------------ ----------------
# 7 Jonathan 5天 2020-05-30


Keyword

1
2
演算法, Algorithm, al-go-rithm