重構小麥注音選字演算法：從 DAG Shortest Path 到 Viterbi

Introduction

最近偶然讀到 NLP 聖經 Speech and Language Processing，書中提到對於 HMM (Hidden Markov Model)，最標準常見的解碼演算法其實是 Viterbi 演算法。

這讓我回頭重新研究小麥注音 (McBopomofo) 的核心選字引擎：Gramambular，並用簡化後的 Viterbi 進行重構。

Gramambular Architecture

Gramambular 是小麥注音負責從注音符號轉換為對應中文字的核心模組。它的原理是基於 Unigram 語言模型，透過最大概似估計 (Maximum Likelihood Estimation, MLE) 找出機率最高的候選字詞組合。同時為了避免浮點數運算產生的 underflow 問題，是以對數機率 (log probability) 進行計算。

在深入演算法之前，我們先來看看幾個定義在 Source/Engine/gramambular2/reading_grid.h 的核心資料結構：

1. Unigram

這是最基本的單位，代表一個詞及其對應的權重。Unigram 是最單純的語言模型，基於一個核心假設：字詞之間是相互獨立的，也就是說，某個字詞出現的機率與前文沒有關係。

class LanguageModel::Unigram {
    std::string value_;     // 字詞，例如「在」
    std::string rawValue_;  // 原始值
    double score_;          // 對數機率，例如 -2.23695407
};

2. Node

Node 代表特定讀音下的候選字詞集合。例如輸入「ㄗㄞˋ」，可能對應到「在」、「再」等字。

class Node {
    std::string reading_;                           // 讀音，如「ㄗㄞˋ」
    size_t spanningLength_;                         // 跨越長度 (佔幾個注音)
    std::vector<LanguageModel::Unigram> unigrams_;  // 候選字詞列表
    OverrideType overrideType_;                     // 使用者選字覆寫狀態
};

3. Span

而 Span 則是指在 Grid 中「同一個起始位置」的所有節點集合。

class Span {
    std::array<NodePtr, kMaximumSpanLength> nodes_;  // 最多 8 個不同長度的節點
    size_t maxLength_;                               // 目前最大長度
};

4. ReadingGrid

整個 ReadingGrid 就是一個 Span 的陣列，其長度等於使用者輸入的注音符號數量。

class ReadingGrid {
    std::vector<std::string> readings_;    // 使用者輸入的注音序列
    std::vector<Span> spans_;              // 每個位置的 Span
    size_t cursor_;                        // 游標位置
    ScoreRankedLanguageModel lm_;          // 語言模型介面
};

Original Approach: DAG Shortest Path

ReadingGrid 本質上是一個有向無環圖 (DAG)。原先的解法對應到 Thomas H. Cormen 所著，資工系學生人手一本的演算法教科書 Introduction to Algorithms 上的 DAG Shortest Path 演算法 (其實我們找的是 longest path，因為對數機率越大越好)。

為了進行這個演算法，原始實作中定義了一個 Vertex 資料結構來表示 DAG 中的每個節點：

// Defines a vertex of a DAG. This is a mutable data structure used for both
// DAG construction and single-source shortest-path computation.
struct Vertex {
    ReadingGrid::NodePtr node;
    std::vector<Vertex*> edges;

    // Used during topological-sort.
    bool topologicallySorted = false;

    // Used during shortest-path computation. We are actually computing the
    // path with the *largest* weight, hence distance's initial value being
    // negative infinity.
    double distance = -std::numeric_limits<double>::infinity();
    Vertex* prev = nullptr;
};

整個演算法會經過以下幾個步驟：

1. Build Graph

假設輸入注音為「ㄕㄜˋ ㄐㄧˋ ㄔㄥˊ ㄕˋ ㄇㄚˇ」，目標是找出「設計程式碼」：

Spans	spanLen	Nodes	Unigrams	讀音	字詞	對數機率	備註
0	1	0	0	`ㄕㄜˋ`	社	-3.10
0	1	0	1	`ㄕㄜˋ`	設	-3.19	「社」更高分，所以「設」從不會被考慮
0	2	1	0	`ㄕㄜˋ ㄐㄧˋ`	設計	-3.58
1	1	0	0	`ㄐㄧˋ`	計	-3.13
2	1	0	0	`ㄔㄥˊ`	成	-2.70
2	2	1	0	`ㄔㄥˊ ㄕˋ`	城市	-3.99
2	2	1	1	`ㄔㄥˊ ㄕˋ`	程式	-4.08	「城市」更高分，所以「程式」從不會被考慮
2	3	2	0	`ㄔㄥˊ ㄕˋ ㄇㄚˇ`	程式碼	-5.10
3	1	0	0	`ㄕˋ`	是	-2.04
4	1	0	0	`ㄇㄚˇ`	馬	-3.47
4	1	0	1	`ㄇㄚˇ`	碼	-4.06	「馬」更高分，所以「碼」從不會被考慮

將每個 Node 包裝成一個獨立的 Vertex 物件，並將這些 Vertex 按照索引位置放入名為 vspans 的二維陣列中。

using VertexSpan = std::vector<Vertex>;
std::vector<VertexSpan> vspans(spans_.size(), VertexSpan());

for (size_t i = 0, len = spans_.size(); i < len; ++i) {
  const ReadingGrid::Span& span = spans_[i];
  for (size_t j = 1, maxSpanLen = span.maxLength(); j <= maxSpanLen; ++j) {
    NodePtr p = span.nodeOf(j);
    if (p != nullptr) {
      vspans[i].emplace_back(std::move(p));
    }
  }
}

`vspan`	0	1	2
0	社	設計	-
1	計	-	-
2	成	城市	程式碼
3	是	-	-
4	馬	-	-

接著，為了跑單一來源最短路徑 (Single-source Shortest Path)，加入了 _ROOT_ 與 _TERMINAL_ 這兩個虛擬節點。然後根據各字詞的跨越長度 (spanningLength)，找出在陣列中下一個能接上的位置，並把目標 Vertex 的指標加進 edges 讓它連起來。

先接 TERMINAL 與中間各段：

Vertex terminal(std::make_shared<ReadingGrid::Node>(
    "_TERMINAL_", 0, std::vector<LanguageModel::Unigram>()));

for (size_t i = 0, vspansLen = vspans.size(); i < vspansLen; ++i) {
  for (Vertex& v : vspans[i]) {
    size_t nextVertexPos = i + v.node->spanningLength();
    if (nextVertexPos == vspansLen) {
      v.edges.push_back(&terminal);
      continue;
    }

    for (Vertex& nv : vspans[nextVertexPos]) {
      v.edges.push_back(&nv);
    }
  }
}

graph LR
    %% Special Nodes
    TERMINAL((_TERM_))

    %% Regular Nodes
    V1_1((&ensp;社&ensp;))
    V1_2((設計))
    V2_1((&ensp;計&ensp;))
    V3_1((&ensp;成&ensp;))
    V3_2((城市))
    V3_3((程式碼))
    V4_1((&ensp;是&ensp;))
    V5_1((&ensp;馬&ensp;))

    %% Edges
    V1_1 --> V2_1
    V1_2 --> V3_1
    V1_2 --> V3_2
    V1_2 --> V3_3
    V2_1 --> V3_1
    V2_1 --> V3_2
    V2_1 --> V3_3
    V3_1 --> V4_1
    V3_2 --> V5_1
    V3_3 --> TERMINAL
    V4_1 --> V5_1
    V5_1 --> TERMINAL

    %% Styling
    classDef special fill:#eeeeee,stroke:#999999,stroke-width:2px;
    classDef normal fill:#e1f5fe,stroke:#01579b,stroke-width:2px;

    class ROOT,TERMINAL special;
    class V1_1,V1_2,V2_1,V3_1,V3_2,V3_3,V4_1,V5_1 normal;

Vertex root(std::make_shared<ReadingGrid::Node>(
    "_ROOT_", 0, std::vector<LanguageModel::Unigram>()));

root.distance = 0;
for (Vertex& v : vspans[0]) {
  root.edges.push_back(&v);
}

最後將 ROOT 也連接上來，我們就可以得到：

graph LR
    %% Special Nodes
    ROOT((_ROOT_))
    TERMINAL((_TERM_))

    %% Regular Nodes
    V1_1((&ensp;社&ensp;))
    V1_2((設計))
    V2_1((&ensp;計&ensp;))
    V3_1((&ensp;成&ensp;))
    V3_2((城市))
    V3_3((程式碼))
    V4_1((&ensp;是&ensp;))
    V5_1((&ensp;馬&ensp;))

    %% Edges
    ROOT --> V1_1
    ROOT --> V1_2

    V1_1 --> V2_1
    V1_2 --> V3_1
    V1_2 --> V3_2
    V1_2 --> V3_3
    V2_1 --> V3_1
    V2_1 --> V3_2
    V2_1 --> V3_3
    V3_1 --> V4_1
    V3_2 --> V5_1
    V3_3 --> TERMINAL
    V4_1 --> V5_1
    V5_1 --> TERMINAL

    %% Styling
    classDef special fill:#eeeeee,stroke:#999999,stroke-width:2px;
    classDef normal fill:#e1f5fe,stroke:#01579b,stroke-width:2px;

    class ROOT,TERMINAL special;
    class V1_1,V1_2,V2_1,V3_1,V3_2,V3_3,V4_1,V5_1 normal;

2. Topological Sort (拓撲排序)

雖然 ReadingGrid 本身從左到右有著絕對的方向性，但原先的實作仍會從 _ROOT_ 出發，透過深度優先搜尋 (DFS) 對整張圖進行標準的拓撲排序。為了加速，這裡使用 stack 與 State 來實作非遞迴版本的 DFS。

std::vector<Vertex*> TopologicalSort(Vertex* root) {
  std::vector<Vertex*> result;
  struct State {
    explicit State(Vertex* vv) : v(vv), edgeIter(v->edges.begin()) {}
    Vertex* v;
    std::vector<Vertex*>::iterator edgeIter;
  };
  std::stack<State> stack;
  stack.emplace(root);

  while (!stack.empty()) {
    State& state = stack.top();
    Vertex* v = state.v;

    // 依序存取所有的邊
    if (state.edgeIter != v->edges.end()) {
      Vertex* nv = *state.edgeIter;
      ++state.edgeIter;
      if (!nv->topologicallySorted) {
        stack.emplace(nv);
        continue;
      }
    }

    // 當此節點所有相鄰邊都存取完後，將其標記並推入結果陣列
    v->topologicallySorted = true;
    result.push_back(v);
    stack.pop();
  }

  return result; // 注意這裡回傳的是 Post-Order，需要反轉才是正常的拓撲排序
}

回傳出來的結果中，排在最後面的反而會是開頭的節點 _ROOT_，所以我們會得到一個 Post-Order 序列：

graph RL
    %% Nodes in Post-Order
    TERMINAL((_TERM_))
    V5_1((&ensp;馬&ensp;))
    V4_1((&ensp;是&ensp;))
    V3_1((&ensp;成&ensp;))
    V3_2((城市))
    V3_3((程式碼))
    V2_1((&ensp;計&ensp;))
    V1_1((&ensp;社&ensp;))
    V1_2((設計))
    ROOT((_ROOT_))

    %% Reverse Post-Order sequence flow
    ROOT --> V1_2 --> V1_1 --> V2_1 --> V3_3 --> V3_2 --> V3_1 --> V4_1 --> V5_1 --> TERMINAL

    %% Styling
    classDef special fill:#eeeeee,stroke:#999999,stroke-width:2px;
    classDef normal fill:#e1f5fe,stroke:#01579b,stroke-width:2px;
    class TERMINAL,ROOT special;
    class V1_1,V1_2,V2_1,V3_1,V3_2,V3_3,V4_1,V5_1 normal;

3. Relax (鬆弛)

在此階段，我們會將剛才拿到的拓撲陣列以反向順序 (也就是正常的拓撲從頭到尾順序)，依序對每個節點的相鄰邊進行 Relax 的動作。只要到達相鄰節點 v 的目前距離小於「經過節點 u 的距離加上節點 v 本身的權重」，就更新節點 v 的最佳距離以及 prev。

因為我們追求的是最大的對數機率，所以 Vertex 初始化時的 distance 都是負無限大。

void Relax(Vertex* u, Vertex* v) {
  // 將目標節點的對數機率分數當作這條邊權重
  double w = v->node->score();

  // 若從節點 u 走過去可以累積更高的分數，則更新對 v 的最佳路徑
  if (v->distance < u->distance + w) {
    v->distance = u->distance + w;
    v->prev = u;
  }
}

// ...

std::vector<Vertex*> ordered = TopologicalSort(&root);
// 反向從 rbegin 到 rend，也就是用正常的拓撲排序順序來進行 Relax
for (auto it = ordered.rbegin(), rend = ordered.rend(); it != rend; ++it) {
  Vertex* u = *it;
  for (Vertex* v : u->edges) {
    Relax(u, v);
  }
}

經過 Relax 之後，路徑上 distance 與 prev 就能夠正確對應：

graph RL
    TERMINAL(("_TERM_<br>-8.68"))
    ROOT((_ROOT_<br>0))

    V1_1((社<br>-3.10))
    V1_2((設計<br>-3.58))
    V2_1((計<br>-6.23))
    V3_1((成<br>-6.28))
    V3_2((城市<br>-7.57))
    V3_3(("程式碼<br>-8.68"))
    V4_1((是<br>-8.32))
    V5_1((馬<br>-11.04))

    TERMINAL -->|prev| V3_3
    TERMINAL -.-> V5_1
    V5_1 -->|prev| V3_2
    V5_1 -.-> V4_1
    V4_1 -->|prev| V3_1
    V3_3 -->|prev| V1_2
    V3_2 -->|prev| V1_2
    V3_1 -->|prev| V1_2
    V3_1 -.-> V2_1
    V2_1 -->|prev| V1_1
    V1_2 -->|prev| ROOT
    V1_1 -->|prev| ROOT

    classDef special fill:#eeeeee,stroke:#999999,stroke-width:2px;
    classDef best fill:#ffe0b2,stroke:#e65100,stroke-width:3px;
    classDef normal fill:#e1f5fe,stroke:#01579b,stroke-width:2px;

    class ROOT best;
    class V1_1,V2_1,V3_1,V3_2,V4_1,V5_1 normal;
    class V1_2,V3_3,TERMINAL best;

    linkStyle 0,5,10 stroke:#d84315,stroke-width:4px,fill:none;

當走到終點 _TERMINAL_ 時，沿著 prev 回推可得到分數總和最高的路徑：程式碼 → 設計 (總分 $-8.68$ ，優於經過 馬 的路徑 $-11.04$ )。

4. Path Reconstruction (路徑回溯)

最後我們進行簡單的路徑回推，從 _TERMINAL_ 開始，沿著 prev 記錄的反向指標一路回頭存取，將走過的節點 (node) 存到 walked 陣列中。最後只要不收錄 _ROOT_ 與 _TERMINAL_ 並翻轉整個陣列的順序，即為預測出來的最佳注音選字結果。

std::vector<NodePtr> walked;
Vertex* it = &terminal;

// 沿著 prev 一步步走回去
while (it->prev != nullptr) {
  walked.push_back(it->prev->node);
  it = it->prev;
}

// 去除前後虛擬節點，並且翻轉輸出 (從 rbegin + 1 開始取即可略過 _ROOT_)
result.nodes = std::vector<NodePtr>(walked.rbegin() + 1, walked.rend());

graph RL
    TERMINAL((_TERM_))
    V3_3((程式碼))
    V1_2((&ensp;設計&ensp;))
    ROOT((_ROOT_))

    TERMINAL -->|prev| V3_3
    V3_3 -->|prev| V1_2
    V1_2 -->|prev| ROOT

    classDef node fill:#ffe0b2,stroke:#e65100,stroke-width:3px;
    class TERMINAL,V3_3,V1_2,ROOT node;

    %% Highlight
    linkStyle 0,1,2 stroke:#d84315,stroke-width:4px,fill:none;

這個演算法理論上的時間複雜度是 $O(|V|+|E|)$ 。然而，在效能分析的過程中發現，建圖和拓樸排序的過程帶來了明顯的瓶頸，我們花了不少時間與記憶體來從 spans_ 裡去產生 Vertex 來建圖，而這些額外的開銷會花不少時間。

Viterbi

符號意義

符號	意義
$T$	觀測序列的總長度 (時間步數)
$N$	隱藏狀態的總數量
$o_t$	在時間 $t$ 觀察到的觀測值
$\pi_s$	初始機率 (initial probability)，即系統一開始處於狀態 $s$ 的機率
$a_{s',s}$	轉移機率 (transition probability)，從前一個狀態 $s'$ 轉移到目前狀態 $s$ 的機率
$b_s(o_t)$	發射機率 (emission probability)，在狀態 $s$ 下產生觀測值 $o_t$ 的機率
$\text{viterbi}[s, t]$	在時間 $t$ 處於狀態 $s$ 且觀測到前 $t$ 個觀測值的最大機率值
$\text{backpointer}[s, t]$	記錄在時間 $t$ 到達狀態 $s$ 時，最可能的前一個狀態是誰 (用於最後回溯路徑)

1. Initialization (初始化)

對於每個狀態 $s = 1$ 到 $N$ ：

\begin{align*} &\text{viterbi}[s, 1] = \pi_s \times b_s(o_1) \\ &\text{backpointer}[s, 1] = 0 \end{align*}

在時間 $t=1$ (第一步)，我們計算每個狀態作為起始狀態的機率。

這等於「初始機率 $\pi_s$ 」乘以「在該狀態下看到第一個觀測值 $o_1$ 的機率 $b_s(o_1)$ 」。
$\text{backpointer}$ 設為 0，因為前面沒有狀態了。

2. Recursion (遞迴)

這裡的遞迴是指數學意義上的「遞迴關係」，不是程設中的函式自己呼叫自己

對於每個時間 $t = 2$ 到 $T$ ，以及每個狀態 $s = 1$ 到 $N$ ：

\begin{align*} \text{viterbi}[s, t] &= \max_{s'=1}^{N} \left( \text{viterbi}[s', t-1] \times a_{s',s} \times b_s(o_t) \right) \\ \text{backpointer}[s, t] &= \text{argmax}_{s'=1}^{N} \left(\text{viterbi}[s', t-1] \times a_{s',s} \times b_s(o_t) \right) \end{align*}

這步是演算法的核心，利用動態規劃的觀念：

為了算出在時間 $t$ 處於狀態 $s$ 的最大機率，我們檢查所有可能的前一狀態 $s'$
我們選取一個 $s'$ ，使得「前一狀態的機率 $\text{viterbi}[s', t-1]$ 」 $\times$ 「從 $s'$ 到 $s$ 的轉移機率 $a_{s',s}$ 」 $\times$ 「在狀態 $s$ 產生的觀測值機率 $b_s(o_t)$ 」達到最大
$\max$ ：計算並儲存這個最高的機率值到 $\text{viterbi}$ 裡
$\text{argmax}$ ：將「會有最高機率的前一個狀態 $s'$ 」記錄在 $\text{backpointer}$ 中，就像是在走迷宮時，在每個路口標記「我是從哪個路口過來的」

3. Termination (終止)

\begin{align*} \text{bestpathprob} &= \max_{s=1}^{N} \text{viterbi}[s, T] \\ \text{bestpathpointer} &= \text{argmax}_{s=1}^{N} \text{viterbi}[s, T] \end{align*}

當時間到達最後一步 $T$ 時：

我們在 $\text{viterbi}[s, T]$ 中找到機率最大的那個狀態，這個機率就是整條路徑的 $\text{bestpathprob}$
同時找到對應的最後一個狀態 $\text{bestpathpointer}$

4. Path Backtracking (路徑回溯)

從最後一個狀態 $\text{bestpathpointer}$ 開始，根據 $\text{backpointer}$ 矩陣中記錄的指標，一步步往回找
例如：時間 $T$ 的狀態是指向 $s_{T-1}$ ，時間 $T-1$ 的狀態又指向 $s_{T-2}$ … 直到回到時間 $1$ 。
最後將這條路徑反轉，我們就得到了最佳路徑～

看起來是不是很複雜？沒關係，我也這麼覺得XD

既然選字的邏輯只是要求解有向無環圖中的最短 (長) 路徑，於是我參考了 Vlad Niculae 教授的課程投影片，他寫的非常簡單清楚：

viterbi-dag

以下的實作就是照著這張投影片刻出來的。

The Refactoring: Viterbi Algorithm

觀察 ReadingGrid 的結構，可以發現它其實是一個線性的 lattice。每個節點從位置 i 開始，跨越 spanLen 長度，結束在 i + spanLen。因為 spanLen >= 1，邊的方向永遠是向後的 (index 變大)。

這代表：陣列的索引順序自然就是拓樸順序 (topological order)。我們其實不需要額外建立 Graph 或進行拓樸排序，可直接參考 Viterbi 演算法的概念進行重構。

1. Data Structure

Viterbi 本質上是一個動態規劃 (DP) 的問題，我們只需要一個簡單的 State 結構來紀錄 DP 狀態：

// Defines a state in the DP table.
struct State {
    size_t fromIndex = 0;
    ReadingGrid::NodePtr fromNode = nullptr;
    double maxScore = -std::numeric_limits<double>::infinity();
};

2: Accumulating Maximum Scores

演算法的核心就是計算到達每個位置的最大累積機率。

我們從位置 0 開始掃描到結尾。對於位置 i 的所有候選節點 (Node)，我們計算接上這個 Node 後的分數，並嘗試更新目標位置 i + spanLen 的最大分數。

這其實對應到剛剛 Viterbi 中的遞迴關係：

\text{viterbi}[s, t] = \max_{s'=1}^{N} \left( \text{viterbi}[s', t-1] \times a_{s',s} \times b_s(o_t) \right)

但因為我們是 Unigram 模型，沒有轉移機率 ( $a_{s',s} = 1$ )，且使用的是對數機率 (乘法變加法)，公式簡化為：

\text{viterbi}[i + spanLen] = \max\left(\text{viterbi}[i + spanLen],\; \text{viterbi}[i] + \text{score}(node)\right)

程式碼實作如下：

const size_t readingLen = readings_.size();
std::vector<State> viterbi(readingLen + 1);
viterbi[0].maxScore = 0.0;

for (size_t i = 0; i < readingLen; ++i) {
    const ReadingGrid::Span& span = spans_[i];
    const size_t maxSpanLen = span.maxLength();

    for (size_t spanLen = 1; spanLen <= maxSpanLen; ++spanLen) {
        const ReadingGrid::NodePtr& node = span.nodeOf(spanLen);
        if (node == nullptr) {
            continue;
        }

        // Relaxation: Update target state if current path is better
        double score = viterbi[i].maxScore + node->score();
        State& target = viterbi[i + spanLen];
        if (score > target.maxScore) {
            target.maxScore = score;
            target.fromNode = node;
            target.fromIndex = i;
        }
    }
}

3: Path Reconstruction (Backtracking)

當 DP 表格填滿後，最後一個位置紀錄的 maxScore 就是最佳路徑的總分。此時我們只需要沿著 fromIndex 和 fromNode 一路回溯 (backtrack) 到起點，就能還原出整條路徑 (當然最後要記得反轉一下)。

// Reconstruct the most likely path by tracing back
for (size_t curr = readingLen; curr > 0; curr = viterbi[curr].fromIndex) {
    result.nodes.emplace_back(std::move(viterbi[curr].fromNode));
}
std::reverse(result.nodes.begin(), result.nodes.end());

4. Walkthrough Example

假設輸入注音為「ㄕㄜˋ ㄐㄧˋ ㄔㄥˊ ㄕˋ ㄇㄚˇ」，目標是找出「設計程式碼」：

Spans	spanLen	Nodes	Unigrams	讀音	字詞	對數機率	備註
0	1	0	0	`ㄕㄜˋ`	社	-3.10
0	1	0	1	`ㄕㄜˋ`	設	-3.19	「社」更高分，所以「設」從不會被考慮
0	2	1	0	`ㄕㄜˋ ㄐㄧˋ`	設計	-3.58
1	1	0	0	`ㄐㄧˋ`	計	-3.13
2	1	0	0	`ㄔㄥˊ`	成	-2.70
2	2	1	0	`ㄔㄥˊ ㄕˋ`	城市	-3.99
2	2	1	1	`ㄔㄥˊ ㄕˋ`	程式	-4.08	「城市」更高分，所以「程式」從不會被考慮
2	3	2	0	`ㄔㄥˊ ㄕˋ ㄇㄚˇ`	程式碼	-5.10
3	1	0	0	`ㄕˋ`	是	-2.04
4	1	0	0	`ㄇㄚˇ`	馬	-3.47
4	1	0	1	`ㄇㄚˇ`	碼	-4.06	「馬」更高分，所以「碼」從不會被考慮

我們定義：

viterbi[i]
- maxScore：到達位置 i 的最大分數 (log probability，越大越好)
- fromIndex：最佳路徑來自哪個位置
- fromNode：到達位置 i 時來自哪個字詞

初始條件：

	`maxScore`	`fromIndex`	`fromNode`
`viterbi[0]`	0.0	-	-

viterbi[1]

候選路徑：

`i`	`spanLen`	字詞	詞頻	計算	結果
0	1	社	-3.10	0.00 + (-3.10)	-3.10

結果如圖所示，注意每一個 V 都是對應到 viterbi[i] 的狀態，邊的權重則是對應到 Node 的分數：

graph LR
  %% Nodes
  V0((V0))
  V1((V1))

  %% Edges
  V0 -->|"社 -3.10"| V1

  %% Styling
  classDef selected fill:#ffe0b2,stroke:#e65100,stroke-width:3px;

  class V0,V1 selected;

  %% Link 0: V0->V1

  %% Highlight Link 1 & 6
  linkStyle 0 stroke:#d84315,stroke-width:4px,fill:none;

更新：

	`maxScore`	`fromIndex`	`fromNode`
`viterbi[0]`	0.0	-	-
`viterbi[1]`	-3.10	0	社

viterbi[2]

候選路徑：

`i`	`spanLen`	字詞	詞頻	計算	結果
0	2	設計	-3.58	0.00 + (-3.58)	-3.58
1	1	計	-3.13	-3.10 + (-3.13)	-6.23

比較：

-3.58 > -6.23
選擇「設計」

結果如圖所示，注意每一個 V 都是對應到 viterbi[i] 的狀態，邊的權重則是對應到 Node 的分數：

graph LR
  %% Nodes
  V0((V0))
  V1((V1))
  V2((V2))

  %% Edges
  V0 -->|"社 -3.10"| V1
  V0 -->|"設計 -3.58"| V2
  V1 -->|"計 -3.13"| V2

  %% Styling
  classDef state fill:#e1f5fe,stroke:#01579b,stroke-width:2px;
  classDef selected fill:#ffe0b2,stroke:#e65100,stroke-width:3px;

  class V0,V2 selected;
  class V1 state;

  %% Link 0: V0->V1
  %% Link 1: V0->V2
  %% Link 2: V1->V2

  %% Highlight Link 1 & 6
  linkStyle 1 stroke:#d84315,stroke-width:4px,fill:none;

更新：

	`maxScore`	`fromIndex`	`fromNode`
`viterbi[0]`	0.0	-	-
`viterbi[1]`	-3.10	0	社
`viterbi[2]`	-3.58	0	設計

viterbi[3]

候選路徑：

`i`	`spanLen`	字詞	詞頻	計算	結果
2	1	成	-2.70	-3.58 + (-2.70)	-6.28

結果如圖所示，注意每一個 V 都是對應到 viterbi[i] 的狀態，邊的權重則是對應到 Node 的分數：

graph LR
  %% Nodes
  V0((V0))
  V1((V1))
  V2((V2))
  V3((V3))

  %% Edges
  V0 -->|"社 -3.10"| V1
  V0 -->|"設計 -3.58"| V2
  V1 -->|"計 -3.13"| V2
  V2 -->|"成 -2.70"| V3

  %% Styling
  classDef state fill:#e1f5fe,stroke:#01579b,stroke-width:2px;
  classDef selected fill:#ffe0b2,stroke:#e65100,stroke-width:3px;

  class V0,V2,V3 selected;
  class V1 state;

  %% Link 0: V0->V1
  %% Link 1: V0->V2
  %% Link 2: V1->V2
  %% Link 3: V2->V3

  %% Highlight Link 1
  linkStyle 1,3 stroke:#d84315,stroke-width:4px,fill:none;

更新：

	`maxScore`	`fromIndex`	`fromNode`
`viterbi[0]`	0.0	-	-
`viterbi[1]`	-3.10	0	社
`viterbi[2]`	-3.58	0	設計
`viterbi[3]`	-6.28	2	成

viterbi[4]

候選路徑：

`i`	`spanLen`	字詞	詞頻	計算	結果
2	2	城市	-3.99	-3.58 + (-3.99)	-7.57
3	1	是	-2.04	-6.28 + (-2.04)	-8.32

比較：

-7.57 > -8.32
選擇「城市」

結果如圖所示，注意每一個 V 都是對應到 viterbi[i] 的狀態，邊的權重則是對應到 Node 的分數：

graph LR
  %% Nodes
  V0((V0))
  V1((V1))
  V2((V2))
  V3((V3))
  V4((V4))

  %% Edges
  V0 -->|"社 -3.10"| V1
  V0 -->|"設計 -3.58"| V2
  V1 -->|"計 -3.13"| V2
  V2 -->|"成 -2.70"| V3
  V2 -->|"城市 -3.99"| V4
  V3 -->|"是 -2.04"| V4

  %% Styling
  classDef state fill:#e1f5fe,stroke:#01579b,stroke-width:2px;
  classDef selected fill:#ffe0b2,stroke:#e65100,stroke-width:3px;

  class V0,V2,V4 selected;
  class V1,V3 state;

  %% Link 0: V0->V1
  %% Link 1: V0->V2
  %% Link 2: V1->V2
  %% Link 3: V2->V3
  %% Link 4: V2->V4
  %% Link 5: V3->V4

  %% Highlight Link 1 & 6
  linkStyle 1,4 stroke:#d84315,stroke-width:4px,fill:none;

更新：

	`maxScore`	`fromIndex`	`fromNode`
`viterbi[0]`	0.0	-	-
`viterbi[1]`	-3.10	0	社
`viterbi[2]`	-3.58	0	設計
`viterbi[3]`	-6.28	2	成
`viterbi[4]`	-7.57	2	城市

viterbi[5]

候選路徑：

`i`	`spanLen`	字詞	詞頻	計算	結果
2	3	程式碼	-5.10	-3.58 + (-5.10)	-8.68
4	1	馬	-3.47	-7.57 + (-3.47)	-11.04

比較：

-8.68 > -11.04
選擇「程式碼」

結果如圖所示，注意每一個 V 都是對應到 viterbi[i] 的狀態，邊的權重則是對應到 Node 的分數：

graph LR
  %% Nodes
  V0((V0))
  V1((V1))
  V2((V2))
  V3((V3))
  V4((V4))
  V5((V5))

  %% Edges
  V0 -->|"社 -3.10"| V1
  V0 -->|"設計 -3.58"| V2
  V1 -->|"計 -3.13"| V2
  V2 -->|"成 -2.70"| V3
  V2 -->|"城市 -3.99"| V4
  V3 -->|"是 -2.04"| V4
  V2 -->|"程式碼 -5.10"| V5
  V4 -->|"馬 -3.47"| V5

  %% Styling
  classDef state fill:#e1f5fe,stroke:#01579b,stroke-width:2px;
  classDef selected fill:#ffe0b2,stroke:#e65100,stroke-width:3px;

  class V0,V2,V5 selected;
  class V1,V3,V4 state;

  %% Link 0: V0->V1
  %% Link 1: V0->V2
  %% Link 2: V1->V2
  %% Link 3: V2->V3
  %% Link 4: V2->V4
  %% Link 5: V3->V4
  %% Link 6: V2->V5
  %% Link 7: V4->V5

  %% Highlight Link 1 & 6
  linkStyle 1,6 stroke:#d84315,stroke-width:4px,fill:none;

更新：

	`maxScore`	`fromIndex`	`fromNode`
`viterbi[0]`	0.0	-	-
`viterbi[1]`	-3.10	0	社
`viterbi[2]`	-3.58	0	設計
`viterbi[3]`	-6.28	2	成
`viterbi[4]`	-7.57	2	城市
`viterbi[5]`	-8.68	2	程式碼

Path Reconstruction (Backtracking)

從終點開始回推：

`curr`	`fromIndex`	`fromNode`
5	2	程式碼
2	0	設計
0	-	-

順序反轉後得到最終路徑：設計 + 程式碼

graph LR
  %% Nodes
  V0((V0))
  V2((V2))
  V5((V5))

  %% Edges
  V0 -->|"設計 -3.58"| V2
  V2 -->|"程式碼 -5.10"| V5

  %% Styling
  classDef selected fill:#ffe0b2,stroke:#e65100,stroke-width:3px;

  class V0,V2,V5 selected;

  %% Link 0: V0->V2
  %% Link 1: V2->V5

  %% Highlight Link 0 & 1
  linkStyle 0,1 stroke:#d84315,stroke-width:4px,fill:none;

Performance

這次重構移除了舊有的 Vertex 結構與拓撲排序步驟，直接在陣列上操作，順帶改善了 cache locality 並減少了記憶體使用量。

使用 reading_grid_test.cpp 中的壓力測試：

TEST(ReadingGridTest, StressTest) {
  constexpr char kStressData[] = R"(
ㄧ 一 -2.08170692
ㄧ-ㄧ 一一 -4.38468400
)";

  ReadingGrid grid(std::make_shared<SimpleLM>(kStressData));
  for (int i = 0; i < 8001; i++) {
    grid.insertReading("ㄧ");
  }
  ReadingGrid::WalkResult result = grid.walk();
  std::cout << "stress test elapsed: " << result.elapsedMicroseconds
            << " microseconds, vertices: " << result.vertices
            << ", edges: " << result.edges << "\n";
}

在 MacBook Air M2 / 16 GB 上、以 -DCMAKE_BUILD_TYPE=Release 編譯測試結果如下：

1. DAG Shortest Path

stress test elapsed: 951 microseconds, vertices: 16001, edges: 31996

2. Viterbi Algorithm

stress test elapsed: 202 microseconds, vertices: 8001, edges: 16001

我們可以發現，演算法複雜度一樣是 $O(|V|+|E|)$ ，但是 $V$ 與 $E$ 的意義改變了！

$V$ 對應到注音的個數，也就是 Viterbi 的 states
$E$ 則是候選字詞的轉移 (candidate word transitions)

	Algorithm	Round 1	Round 2	Round 3
Before	DAG Shortest Path	951 us	956 us	1014 us
After	Viterbi Algorithm	202 us	213 us	219 us

效能提升了好幾倍！雖然實際使用應該是感覺不出來～但這說明在演算法複雜度相同為 $O(|V|+|E|)$ 的情況下，減少常數項開銷 (constant overhead) 與資料結構的簡化，仍能帶來巨大的效能差異。

Then Notes 隨筆

重構小麥注音選字演算法：從 DAG Shortest Path 到 Viterbi

Introduction

Gramambular Architecture

1. Unigram

2. Node

3. Span

4. ReadingGrid

Original Approach: DAG Shortest Path

1. Build Graph

2. Topological Sort (拓撲排序)

3. Relax (鬆弛)

4. Path Reconstruction (路徑回溯)

Viterbi

符號意義

1. Initialization (初始化)

2. Recursion (遞迴)

3. Termination (終止)

4. Path Backtracking (路徑回溯)

The Refactoring: Viterbi Algorithm

1. Data Structure

2: Accumulating Maximum Scores

3: Path Reconstruction (Backtracking)

4. Walkthrough Example

viterbi[1]

viterbi[2]

viterbi[3]

viterbi[4]

viterbi[5]

Path Reconstruction (Backtracking)

Performance

1. DAG Shortest Path

2. Viterbi Algorithm

Reference

# Introduction

# Gramambular Architecture

# 1. Unigram

# 2. Node

# 3. Span

# 4. ReadingGrid

# Original Approach: DAG Shortest Path

# 1. Build Graph

# 2. Topological Sort (拓撲排序)

# 3. Relax (鬆弛)

# 4. Path Reconstruction (路徑回溯)

# Viterbi

# 符號意義

# 1. Initialization (初始化)

# 2. Recursion (遞迴)

# 3. Termination (終止)

# 4. Path Backtracking (路徑回溯)

# The Refactoring: Viterbi Algorithm

# 1. Data Structure

# 2: Accumulating Maximum Scores

# 3: Path Reconstruction (Backtracking)

# 4. Walkthrough Example

# viterbi[1]

# viterbi[2]

# viterbi[3]

# viterbi[4]

# viterbi[5]

# Path Reconstruction (Backtracking)

# Performance

# 1. DAG Shortest Path

# 2. Viterbi Algorithm

# Reference

Introduction

Gramambular Architecture

1. Unigram

2. Node

3. Span

4. ReadingGrid

Original Approach: DAG Shortest Path

1. Build Graph

2. Topological Sort (拓撲排序)

3. Relax (鬆弛)

4. Path Reconstruction (路徑回溯)

Viterbi

符號意義

1. Initialization (初始化)

2. Recursion (遞迴)

3. Termination (終止)

4. Path Backtracking (路徑回溯)

The Refactoring: Viterbi Algorithm

1. Data Structure

2: Accumulating Maximum Scores

3: Path Reconstruction (Backtracking)

4. Walkthrough Example

viterbi[1]

viterbi[2]

viterbi[3]

viterbi[4]

viterbi[5]

Path Reconstruction (Backtracking)

Performance

1. DAG Shortest Path

2. Viterbi Algorithm

Reference