前缀树详解：用树来存字典的结构

把一本字典装进一棵树，查单词比翻书还快。

写在前面

刷 LeetCode 的时候，遇到「单词搜索」「自动补全」「统计前缀」这类题，很多人第一反应是用 unordered_set 或者暴力遍历。这在数据量小的时候没问题，但一旦单词库变大，或者需要「前缀匹配」而不是「精确匹配」，哈希表就力不从心了。

前缀树（Trie，又叫字典树）就是为这类需求量身定制的数据结构。它本质上是一棵多叉树，把字符串的每个字符拆开，一层一层地挂在树上。听起来简单，但用起来很灵活。

前缀树是什么

用新华字典来类比

查新华字典的时候，你会先翻到对应的拼音首字母，再找第二个字母，以此类推。前缀树的逻辑完全一样：

根节点是空的，不存任何字符
每一层存一个字符
从根到某个节点的路径，就代表一个字符串的前缀
如果某个节点被标记为「终止」，说明从根到这里构成了一个完整的单词

举个例子，把 ["apple", "app", "april", "bat"] 插入前缀树：

        root
       /    \
      a      b
      |      |
      p      a
     / \     |
    p   r    t  ← 终止节点（"bat"）
    |   |
    l   i
    |   |
    e   l  ← 终止节点（"april"）
    |
   (end) ← 终止节点（"apple"）
(end) ← 终止节点（"app"）

可以看到 app 和 apple 共享了 a → p → p 这条路径，这就是前缀树节省空间的核心：公共前缀只存一份。

节点的定义

每个节点需要记录两件事：

子节点：最多 26 个（小写字母），用数组或哈希表存
是否是单词结尾：布尔标记 is_end

struct TrieNode {
    TrieNode* children[26];
    bool is_end;

    TrieNode() : is_end(false) {
        fill(begin(children), end(children), nullptr);
    }
};

用数组下标 ch - 'a' 来映射字母，访问时间是 O(1)。

核心操作

插入（insert）

从根节点出发，对单词的每个字符：

如果对应子节点不存在，新建一个
走到下一层
单词遍历完后，把最后一个节点标记为 is_end = true

void insert(const string& word) {
    TrieNode* node = root;
    for (char ch : word) {
        int idx = ch - 'a';
        if (!node->children[idx]) {
            node->children[idx] = new TrieNode();  // 没有就新建
        }
        node = node->children[idx];  // 往下走一层
    }
    node->is_end = true;  // 标记单词结尾
}

查找（search）

查找某个单词是否存在，和插入类似，但不新建节点，遇到空指针就返回 false。最终还要检查 is_end，因为 app 存在不代表 ap 也是单词。

bool search(const string& word) {
    TrieNode* node = root;
    for (char ch : word) {
        int idx = ch - 'a';
        if (!node->children[idx]) return false;  // 路断了
        node = node->children[idx];
    }
    return node->is_end;  // 路到了，但要确认是单词结尾
}

前缀查询（startsWith）

判断是否有单词以某个字符串为前缀，比 search 更宽松，只要路径存在就行，不需要检查 is_end。

bool startsWith(const string& prefix) {
    TrieNode* node = root;
    for (char ch : prefix) {
        int idx = ch - 'a';
        if (!node->children[idx]) return false;
        node = node->children[idx];
    }
    return true;  // 路径存在就说明有这个前缀
}

完整实现

#include <iostream>
#include <string>
#include <array>
using namespace std;

struct TrieNode {
    TrieNode* children[26];
    bool is_end;

    TrieNode() : is_end(false) {
        fill(begin(children), end(children), nullptr);
    }
};

class Trie {
    TrieNode* root;

public:
    Trie() { root = new TrieNode(); }

    void insert(const string& word) {
        TrieNode* node = root;
        for (char ch : word) {
            int idx = ch - 'a';
            if (!node->children[idx])
                node->children[idx] = new TrieNode();
            node = node->children[idx];
        }
        node->is_end = true;
    }

    bool search(const string& word) {
        TrieNode* node = root;
        for (char ch : word) {
            int idx = ch - 'a';
            if (!node->children[idx]) return false;
            node = node->children[idx];
        }
        return node->is_end;
    }

    bool startsWith(const string& prefix) {
        TrieNode* node = root;
        for (char ch : prefix) {
            int idx = ch - 'a';
            if (!node->children[idx]) return false;
            node = node->children[idx];
        }
        return true;
    }
};

例题讲解

例题一：实现前缀树（LeetCode 208）

实现一个 Trie 类，支持 insert、search、startsWith 三个操作。

这道题就是前缀树的基础实现，上面的代码直接可用。我们着重看一下执行过程。

操作序列：

insert("apple")
insert("app")
search("app")     → true
search("ap")      → false
startsWith("ap")  → true

执行过程：

初始：root (空)

insert("apple"):
  root → [a] → [p] → [p] → [l] → [e]*
                                       ↑ is_end = true

insert("app"):
  root → [a] → [p] → [p]*
                       ↑ is_end = true（共享前缀，只加标记）

search("app"):
  走到第 3 层的 [p] 节点，is_end = true → 返回 true

search("ap"):
  走到第 2 层的 [p] 节点，is_end = false → 返回 false

startsWith("ap"):
  走到第 2 层的 [p] 节点，路径存在 → 返回 true

复杂度：

时间：O(L)，L 为字符串长度，每个操作只遍历一遍字符串
空间：O(N × L × 26)，N 个单词，每个节点最多 26 个子指针

例题二：统计前缀出现次数（LeetCode 2185）

给你一个字符串数组 words 和一个字符串 pref，返回 words 中以 pref 为前缀的字符串数量。

这道题用 startsWith 逻辑即可，但可以在插入时顺便在每个节点记录「经过次数」，一次遍历就能回答所有前缀查询。

struct TrieNode {
    TrieNode* children[26];
    int pass_count;   // 经过该节点的单词数
    bool is_end;

    TrieNode() : pass_count(0), is_end(false) {
        fill(begin(children), end(children), nullptr);
    }
};

class Trie {
    TrieNode* root;

public:
    Trie() { root = new TrieNode(); }

    void insert(const string& word) {
        TrieNode* node = root;
        for (char ch : word) {
            int idx = ch - 'a';
            if (!node->children[idx])
                node->children[idx] = new TrieNode();
            node = node->children[idx];
            node->pass_count++;  // 每经过一个节点就 +1
        }
        node->is_end = true;
    }

    // 查询有多少单词以 prefix 为前缀
    int countPrefix(const string& prefix) {
        TrieNode* node = root;
        for (char ch : prefix) {
            int idx = ch - 'a';
            if (!node->children[idx]) return 0;
            node = node->children[idx];
        }
        return node->pass_count;  // 该节点被经过的次数就是答案
    }
};

执行过程：

插入 ["apple", "app", "april", "banana"]

插入 "apple": a(1) → p(1) → p(1) → l(1) → e(1)*
插入 "app":   a(2) → p(2) → p(2)*
插入 "april": a(3) → p(3) → r(1) → i(1) → l(1)*
插入 "banana": b(1) → a(1) → n(1) → a(1) → n(1) → a(1)*

countPrefix("ap"):
  走到 p 节点（第二层），pass_count = 3
  → 返回 3（"apple", "app", "april" 都以 "ap" 开头）

countPrefix("app"):
  走到 p 节点（第三层），pass_count = 2
  → 返回 2（"apple", "app"）

例题三：单词搜索 II（LeetCode 212）

给定一个 m×n 的字符网格和一个单词列表，找出所有存在于网格中的单词。

这是前缀树的经典复合题，把「回溯搜索网格」和「前缀剪枝」结合起来。

问题分析：

朴素做法：对每个单词做一次 DFS，时间 O(W × m × n × 4^L)，W 是单词数，L 是单词长度，会超时
优化：把所有单词插入前缀树，DFS 时沿着 Trie 边走，走不下去就剪枝

思路：

把所有单词插入 Trie
从网格每个格子出发做 DFS
DFS 时同步在 Trie 上移动，如果当前字符在 Trie 中不存在，直接剪枝
如果走到 node->word 非空的节点，收集答案

class Solution {
    struct TrieNode {
        TrieNode* children[26];
        string word;  // 把完整单词存在终止节点，方便收集答案

        TrieNode() : word("") {
            fill(begin(children), end(children), nullptr);
        }
    };

    TrieNode* root;

    void insert(const string& w) {
        TrieNode* node = root;
        for (char ch : w) {
            int idx = ch - 'a';
            if (!node->children[idx])
                node->children[idx] = new TrieNode();
            node = node->children[idx];
        }
        node->word = w;  // 在终止节点存完整单词
    }

    void dfs(vector<vector<char>>& board, int i, int j,
             TrieNode* node, vector<string>& res) {
        char ch = board[i][j];
        if (ch == '#' || !node->children[ch - 'a']) return;  // 剪枝

        node = node->children[ch - 'a'];

        if (!node->word.empty()) {       // 找到一个单词
            res.push_back(node->word);
            node->word = "";             // 去重：清空避免重复收集
        }

        board[i][j] = '#';               // 标记已访问
        int dirs[4][2] = {{0,1},{0,-1},{1,0},{-1,0}};
        for (auto& d : dirs) {
            int ni = i + d[0], nj = j + d[1];
            if (ni >= 0 && ni < (int)board.size() &&
                nj >= 0 && nj < (int)board[0].size()) {
                dfs(board, ni, nj, node, res);
            }
        }
        board[i][j] = ch;               // 恢复现场
    }

public:
    vector<string> findWords(vector<vector<char>>& board,
                             vector<string>& words) {
        root = new TrieNode();
        for (auto& w : words) insert(w);

        vector<string> res;
        int m = board.size(), n = board[0].size();
        for (int i = 0; i < m; i++)
            for (int j = 0; j < n; j++)
                dfs(board, i, j, root, res);

        return res;
    }
};

执行过程（小示例）：

words = ["eat", "rain"]
board:
e  a  t
r  a  i
x  x  n

建 Trie：
root → e → a → t*("eat")
root → r → a → i → n*("rain")

从 (0,0) 出发，board[0][0]='e'，Trie 有 'e' 节点，进入
  走到 (0,1)，board[0][1]='a'，Trie 有 'e'→'a'，进入
    走到 (0,2)，board[0][2]='t'，Trie 有 'e'→'a'→'t'，node->word 非空
    → 收集 "eat"，清空 word
  回溯...

从 (2,0) 出发，board[2][0]='x'，Trie 无 'x' 节点，直接剪枝

从 (1,0) 出发，board[1][0]='r'，Trie 有 'r' 节点，进入
  走到 (1,1)，board[1][1]='a'，Trie 有 'r'→'a'，进入
    走到 (1,2)，board[1][2]='i'，Trie 有 'r'→'a'→'i'，进入
      走到 (2,2)，board[2][2]='n'，Trie 有 'r'→'a'→'i'→'n'
      → 收集 "rain"，清空 word

...（省略）最终找到 "eat" 和 "rain"

复杂度：

时间：O(m × n × 4^L)，但被 Trie 剪枝大幅减少实际开销
空间：O(W × L) 建树，O(L) 递归栈

容易踩的坑

1. `search` 和 `startsWith` 区别

search("app") 返回 true，意味着 "app" 是一个完整单词。

startsWith("app") 返回 true，只说明有单词以 "app" 开头，"app" 本身可能不在字典里。

两者都要走到路径末尾，区别只在于最后有没有检查 is_end。

插入 "apple"，没有插入 "app"

search("app")     → false（路径存在，但 is_end = false）
startsWith("app") → true（路径存在即可）

2. 节点是字符，不是边

初学时容易把字符想成存在「边」上，但在代码实现里，字符是通过子节点的下标（ch - 'a'）来表示的，节点本身不显式存字符。

3. 内存释放

用 new 建节点，要记得析构时递归 delete。LeetCode 上一般不用手动释放，但在生产代码里需要注意。

void destroy(TrieNode* node) {
    for (auto* child : node->children)
        if (child) destroy(child);
    delete node;
}
~Trie() { destroy(root); }

4. 单词搜索中的去重

LeetCode 212 里，同一个单词可能从多个位置找到。用「找到后清空 node->word」来去重，比往结果集里用 set 去重更高效。

总结

mindmap
  root((前缀树))
    核心结构
      TrieNode
        children数组 26个子指针
        is_end 是否单词结尾
        pass_count 可选 前缀计数
    基本操作
      insert O(L)
      search O(L)
      startsWith O(L)
    典型应用
      统计前缀出现次数
      单词搜索与DFS剪枝
      自动补全
      多模式匹配 AC自动机基础
    常见坑
      search vs startsWith
      字符在下标不在节点
      多结果去重

核心要点：

操作	时间复杂度	关键点
插入	O(L)	沿路径新建节点，末尾标 is_end
精确查找	O(L)	路径存在且 is_end = true
前缀查找	O(L)	路径存在即可，不检查 is_end
前缀计数	O(L)	插入时维护 pass_count

前缀树的空间换时间思路很直接：把字符串分解到树里，代价是每个节点 26 个指针的空间，换来的是所有前缀操作 O(L) 的时间。数据量一大，这笔账很划算。

推荐阅读：

常见算法-KMP（字符串匹配的另一条路，Trie + KMP = AC自动机）
LeetCode 208 · 实现 Trie
LeetCode 212 · 单词搜索 II