2018 年 8 月 4 日 | 留言

Database Query 最佳化實記

兩三個月前遇到一個嚴重拖垮公司資料庫的query，好死不死那個API又是流量主要會出現的點，身為社畜只好想辦法把它解決掉，就這樣我看了一整周的db console和mysql手冊，最後想出一個只需最小修改就能達成的最佳化方案

先說明一下問題點的架構，這邊的設計以正規化來說我是覺得沒問題的

食譜和分類是傳統的多對多關係，因此會有一張中間表；分類本身則是有自我關聯的上下層關係

應用上的需求是這樣的：給定某個分類，找出該分類(置頂與不置頂)及其子分類(不置頂)下的所有食譜(不能重複)，並以「是否置頂」和「發表時間」進行排序

系統原設計是先抓出所有分類ID(某分類及其子分類)後，才進行傳統的join去從分類ID找食譜，這部分因為分類數量不會很大，所以我認為還堪用就不進行修改

真正的瓶頸在後面，當時query長這個樣子

SELECT DISTINCT `recipes`.*
FROM `recipes`
INNER JOIN `category_recipes`
ON `category_recipes`.`recipe_id` = `recipes`.`id`
WHERE ((category_recipes.category_id IN ({category_and_descendants_ids})) AND category_recipes.is_sticky <> 1) OR (category_recipes.category_id = {category_id} AND category_recipes.is_sticky = 1)
ORDER BY category_recipes.is_sticky DESC, recipes.published_at DESC
LIMIT 10
OFFSET 0

一開始的想法是order由兩張表的欄位組成，照手冊的說法不會利用到index，所以試著在category_recipes上面做非正規化，加了一個published_at，值則是利用model callback從recipe同步過來

SELECT DISTINCT `recipes`.*
FROM `recipes`
INNER JOIN `category_recipes`
ON `category_recipes`.`recipe_id` = `recipes`.`id`
WHERE ((category_recipes.category_id IN ({category_and_descendants_ids})) AND category_recipes.is_sticky <> 1) OR (category_recipes.category_id = {category_id} AND category_recipes.is_sticky = 1)
ORDER BY category_recipes.is_sticky DESC, category_recipes.published_at DESC
LIMIT 10
OFFSET 0

可惜這樣跑起來還是不快

後來想到distinct比較的是所有欄位，但index裡面不可能有所有欄位，就往group的方向去思考

反正只要id是唯一的，得到的東西就不會重複

於是query變成這樣

SELECT `recipes`.*
FROM `recipes`
INNER JOIN `category_recipes`
ON `category_recipes`.`recipe_id` = `recipes`.`id`
WHERE ((category_recipes.category_id IN ({category_and_descendants_ids})) AND category_recipes.is_sticky <> 1) OR (category_recipes.category_id = {category_id} AND category_recipes.is_sticky = 1)
GROUP BY category_recipes.recipe_id DESC
ORDER BY category_recipes.is_sticky DESC, category_recipes.published_at DESC
LIMIT 10
OFFSET 0

雖然感覺方向對了，但效能還是沒到理想的狀態

手冊又翻了半天才理解出：
- index 的利用是從左到右
- GROUP BY 和 ORDER BY 共用同一個 index
- GROUP BY 和 ORDER BY 如果有任何一個方向不同就不能用 index

如果能在recipe_id之前用is_sticky和published_at做group的話不是就能用到index了？

很幸運的實驗有成功，資料結果和排序也正確

query最終變成了這樣

SELECT `recipes`.*
FROM `recipes`
INNER JOIN `category_recipes`
ON `category_recipes`.`recipe_id` = `recipes`.`id`
WHERE ((category_recipes.category_id IN ({category_and_descendants_ids})) AND category_recipes.is_sticky <> 1) OR (category_recipes.category_id = {category_id} AND category_recipes.is_sticky = 1)
GROUP BY category_recipes.is_sticky DESC, category_recipes.published_at DESC, category_recipes.recipe_id DESC
ORDER BY category_recipes.is_sticky DESC, category_recipes.published_at DESC
LIMIT 10
OFFSET 0

index的部分我試出這幾種，第一種是效率最高的，雖然只能用到OFFSET 330左右但很夠了

OFFSET如果更大的話，資料庫會選擇其他兩種並建立暫存表和使用filesort

CREATE INDEX {index_name} ON category_recipes(category_id, recipe_id, is_sticky, published_at);
CREATE INDEX {index_name} ON category_recipes(is_sticky, published_at, recipe_id, category_id);

到這邊總算解決query時的效能問題...以為就這樣結束了嗎？

改完後發現分頁套件(kaminari)在計算總數的時候會觸發ActiveRecord組錯SQL的現象，一查才發現是它不支援descending count，原本還想去修，但我才發現group根本不能拿來count我要的東西

# 舉個例子
User.group('email DESC').limit(50).count

# 產生的SQL如下，但語法是錯誤的
# SELECT COUNT(*) AS count_all, email DESC AS email_desc
# FROM `users`
# WHERE `users`.`deleted_at` IS NULL
# GROUP BY email DESC
# LIMIT 50

為什麼？因為我想count的是總數，並不是group完分別的數字，因此計算總數的時候還是要回去用distinct

這邊有個比較簡單的想法，因為ActiveRecord的count屬於terminal method(終點站)，所以可以直接在上面做切換機制

但問題又來了，ActiveRecord的語法是用串的，我要怎麼知道先前有串過group？而且不是所有group都需要換回去distinct，該怎麼分辨？

後來想到的方法是透過SQL comment傳遞訊息，直接在group裡面給字串的時候插一個特定的comment進去，再由count那邊去判斷就好了

Recipe.published.joins(:category_recipes).
  where('((category_recipes.category_id IN (?)) AND category_recipes.is_sticky <> 1) OR (category_recipes.category_id = ? AND category_recipes.is_sticky = 1)', {category_and_descendants_ids}, {category_id}).
  group('category_recipes.is_sticky DESC, category_recipes.published_at DESC, category_recipes.recipe_id DESC /*use_distinct_instead_of_group_for_count*/').
  reorder('category_recipes.is_sticky DESC, category_recipes.published_at DESC').
  page(1).per(10)

ActiveSupport.on_load(:active_record) do
  module ActiveRecord
    class Relation
      alias_method :original_count, :count
      def count(column_name = nil)
        if to_sql =~ /use_distinct_instead_of_group_for_count/i
          unscope(:group).distinct.original_count(column_name)
        else
          original_count(column_name)
        end
      end
    end
  end
end

到這邊才成功解決這個query的效能問題，資料庫CPU使用率和反應速度都有大幅改進，使用者終於不用等到斷線了QQ

2019/01/10 更新

後來遇到特定分類在抓資料的時候無法使用效率最好的index，會產生不必要的系統負擔

解法是透過FORCE INDEX指定query能用的index，避免optimizer選了錯誤的選項

最後SQL和index會長這樣

SELECT `recipes`.*
FROM `recipes`
INNER JOIN `category_recipes` FORCE INDEX (count_index_name,data_index_name)
ON `category_recipes`.`recipe_id` = `recipes`.`id`
WHERE ((category_recipes.category_id IN ({category_and_descendants_ids})) AND category_recipes.is_sticky <> 1) OR (category_recipes.category_id = {category_id} AND category_recipes.is_sticky = 1)
GROUP BY category_recipes.is_sticky DESC, category_recipes.published_at DESC, category_recipes.recipe_id DESC
ORDER BY category_recipes.is_sticky DESC, category_recipes.published_at DESC
LIMIT 10
OFFSET 0

CREATE INDEX count_index_name ON category_recipes(category_id, recipe_id, is_sticky, published_at);
CREATE INDEX data_index_name ON category_recipes(is_sticky, published_at, recipe_id, category_id);

2020/04/10 更新

當 {category_and_descendants_ids} 只有一個數字時，MySQL會認為is_sticky不是有效條件而採取不同的策略

導致index的使用變成如SQL語法順序般的，先看WHERE的category_id，然後照著GROUP BY依序排列

因此必須將FORCE INDEX改為下列的index才能改善此條件下的效能

CREATE INDEX data_index_name ON category_recipes(category_id, is_sticky, published_at, recipe_id);

ATI的軟體&網管技術誌

奇怪的網站

Database Query 最佳化實記

2019/01/10 更新

2020/04/10 更新

留言

取消回覆