哈希表（Hash Table）完全详解¶

重要性：⭐⭐⭐⭐⭐ 难度：⭐⭐⭐ 学习时间：3-4天 前置知识：数组、链表

📚 目录¶

什么是哈希表
哈希函数
哈希冲突
哈希表实现
哈希表操作
经典面试题
实战应用

什么是哈希表？¶

基本概念¶

哈希表是一种通过哈希函数将键（key）映射到存储位置的数据结构，实现了O(1)时间的查找、插入和删除。

生活中的例子¶

想象一个图书馆：

Text Only

传统查找（数组/链表）：
┌─────────────────────────────────┐
│ 书架1 │ 书架2 │ 书架3 │ 书架4 │ ...
└─────────────────────────────────┘
要找"算法导论"，需要逐个书架查找 → O(n)

哈希表查找：
┌─────────────────────────────────┐
│ A区 │ B区 │ C区 │ D区 │ ... │ Z区 │
└─────────────────────────────────┘
"算法导论"在S区，直接去S区找 → O(1)

为什么学哈希表？¶

✅ 超快查找： - 平均时间复杂度：O(1) - 比二分搜索O(log n)更快 - 比线性搜索O(n)快得多

✅ 应用广泛： - Python字典（dict） - JavaScript对象（Object） - Java HashMap - 数据库索引 - 缓存系统（Redis） - 去重、计数

✅ 面试常考： - 两数之和 - 字母异位词分组 - 最长连续序列 - LRU缓存

核心概念¶

Text Only

哈希表 = 哈希函数 + 数组 + 处理冲突的方法

┌─────────────┐
│   Key       │
│  "apple"    │
└─────┬───────┘
      │
      ▼
┌─────────────┐
│ 哈希函数     │
│ hash(key)   │
└─────┬───────┘
      │
      ▼ 返回索引
┌─────────────────────────┐
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ ... │ 9 │
├───┼───┼───┼───┼───┼───┼─────┼───┤
│   │   │   │   │   │   │ ... │   │
└─────────────────────────┘
      ▲
      │ 存储Value
      │
┌─────┴───────┐
│   Value     │
│     5       │
└─────────────┘

哈希函数¶

什么是哈希函数？¶

哈希函数将任意大小的输入映射到固定大小的输出（索引）。

理想的哈希函数¶

✅ 特点： 1. 确定性：相同输入总是产生相同输出 2. 高效性：计算速度快 3. 均匀性：均匀分布，减少冲突 4. 雪崩效应：输入微小变化导致输出巨大变化

常见哈希函数¶

1. 除法哈希（最常用）¶

Python

def hash_division(key, size):
    """
    除法哈希
    公式：hash(key) = key % size

    优点：简单、高效
    缺点：需要选择合适的size
    """
    return key % size

# 示例
size = 10
keys = [15, 25, 35, 45]

print(f"哈希表大小: {size}")
print("除法哈希演示:")
for key in keys:
    hash_value = hash_division(key, size)
    print(f"  key={key} → hash={key} % {size} = {hash_value}")

# 输出：
# 哈希表大小: 10
# 除法哈希演示:
#   key=15 → hash=15 % 10 = 5
#   key=25 → hash=25 % 10 = 5
#   key=35 → hash=35 % 10 = 5
#   key=45 → hash=45 % 10 = 5
#
# ⚠️ 冲突！所有key都映射到索引5

问题：如果size是10，所有以0或5结尾的数都会冲突！

解决：选择质数作为size

Python

def hash_division_optimized(key, size):
    """
    优化的除法哈希
    size选择质数，减少冲突
    """
    return key % size

# 使用质数
size = 13  # 质数
keys = [15, 25, 35, 45]

print(f"\n优化后（size={size}）:")
for key in keys:
    hash_value = hash_division_optimized(key, size)
    print(f"  key={key} → hash={key} % {size} = {hash_value}")

# 输出：
# 优化后（size=13）:
#   key=15 → hash=15 % 13 = 2
#   key=25 → hash=25 % 13 = 12
#   key=35 → hash=35 % 13 = 9
#   key=45 → hash=45 % 13 = 6
#
# ✓ 分布更均匀！

2. 乘法哈希¶

Python

def hash_multiplication(key, size):
    """
    乘法哈希
    公式：hash(key) = floor(size * (key * A % 1))

    其中A = (√5 - 1) / 2 ≈ 0.618（黄金比例）
    """
    A = 0.6180339887  # 黄金比例
    hash_value = int(size * ((key * A) % 1))
    return hash_value

# 示例
size = 10
keys = [15, 25, 35, 45]

print(f"\n乘法哈希演示（size={size}）:")
for key in keys:
    hash_value = hash_multiplication(key, size)
    print(f"  key={key} → hash={hash_value}")

# 输出：
# 乘法哈希演示（size=10）:
#   key=15 → hash=2
#   key=25 → hash=4
#   key=35 → hash=6
#   key=45 → hash=8

3. 字符串哈希¶

Python

def hash_string_division(key, size):
    """
    字符串哈希 - 除法
    公式：hash = (sum of ASCII values) % size
    """
    hash_value = 0
    for char in key:
        hash_value += ord(char)
    return hash_value % size

def hash_string_polynomial(key, size):
    """
    字符串哈希 - 多项式滚动（更均匀）
    公式：hash = (s[0]*p^(n-1) + s[1]*p^(n-2) + ... + s[n-1]) % size

    通常取 p = 31 或 37（质数）
    """
    p = 31  # 质数
    hash_value = 0
    p_power = 1

    for char in key:
        hash_value = (hash_value + ord(char) * p_power) % size
        p_power = (p_power * p) % size

    return hash_value

# 示例
keys = ["apple", "banana", "cherry", "date"]
size = 13

print(f"字符串哈希演示（size={size}）:")
print("\n除法哈希:")
for key in keys:
    hash_value = hash_string_division(key, size)
    print(f"  '{key}' → hash={hash_value}")

print("\n多项式哈希:")
for key in keys:
    hash_value = hash_string_polynomial(key, size)
    print(f"  '{key}' → hash={hash_value}")

# 输出：
# 字符串哈希演示（size=13）:
#
# 除法哈希:
#   'apple' → hash=4
#   'banana' → hash=4
#   'cherry' → hash=5
#   'date' → hash=1
# ⚠️ "apple"和"banana"冲突！
#
# 多项式哈希:
#   'apple' → hash=7
#   'banana' → hash=8
#   'cherry' → hash=11
#   'date' → hash=10
# ✓ 分布更均匀！

Python内置的hash()¶

Python

# Python内置hash函数
print("Python内置hash():")
keys = ["apple", "banana", "cherry"]
for key in keys:
    hash_value = hash(key)
    print(f"  hash('{key}') = {hash_value}")

# 输出：
# Python内置hash():
#   hash('apple') = -5846075893608226612
#   hash('banana') = 6953615479166595733
#   hash('cherry') = -2868163865635627193

# 注意：Python的hash()每次运行结果不同（随机化，防止攻击）
# 但在同一次运行中，相同输入总是产生相同输出

哈希冲突¶

什么是冲突？¶

冲突：两个不同的key映射到同一个索引。

Text Only

例子：
hash("apple") = 5
hash("cream") = 5  ← 冲突！

冲突的可视化¶

Text Only

哈希表（size=10）:

索引  0    1    2    3    4    5    6    7    8    9
      ┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
      │    │    │    │    │    │ 15 │ 25 │    │    │    │
      └────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘
                                  ↑
                              冲突！
                          15和25都映射到索引5

处理冲突的方法¶

方法1：链地址法（Separate Chaining）⭐¶

原理：每个位置维护一个链表，冲突的元素添加到链表中。

Python

class HashTableChaining:
    """
    使用链地址法的哈希表
    时间：
    - 平均：O(1)
    - 最坏：O(n)（所有元素都在一个链表）
    空间：O(n)
    """

    def __init__(self, size=10):
        self.size = size
        # 使用list作为桶，每个桶是一个链表
        self.buckets = [[] for _ in range(size)]
        self.count = 0  # 元素数量

    def _hash(self, key):
        """哈希函数"""
        return hash(key) % self.size

    def put(self, key, value):
        """插入键值对"""
        index = self._hash(key)
        bucket = self.buckets[index]

        # 检查key是否已存在
        for i, (k, v) in enumerate(bucket):
            if k == key:
                # key已存在，更新value
                bucket[i] = (key, value)
                print(f"更新: key='{key}', value={value}, index={index}")
                return

        # key不存在，添加到链表
        bucket.append((key, value))
        self.count += 1
        print(f"插入: key='{key}', value={value}, index={index}, "
              f"桶大小={len(bucket)}")

    def get(self, key):
        """获取value"""
        index = self._hash(key)
        bucket = self.buckets[index]

        for k, v in bucket:
            if k == key:
                print(f"查找: key='{key}', index={index}, value={v}")
                return v

        print(f"查找: key='{key}' 未找到")
        raise KeyError(key)

    def remove(self, key):
        """删除key"""
        index = self._hash(key)
        bucket = self.buckets[index]

        for i, (k, v) in enumerate(bucket):
            if k == key:
                bucket.pop(i)
                self.count -= 1
                print(f"删除: key='{key}', index={index}")
                return

        print(f"删除: key='{key}' 不存在")

    def __str__(self):
        result = []
        for i, bucket in enumerate(self.buckets):
            if bucket:
                result.append(f"[{i}]: {bucket}")
        return "\n".join(result)

# 测试
ht = HashTableChaining(size=5)

print("=" * 60)
print("链地址法测试")
print("=" * 60)

# 插入元素
ht.put("apple", 5)
ht.put("banana", 3)
ht.put("cherry", 8)
ht.put("date", 2)
ht.put("elderberry", 6)  # 可能冲突

print("\n当前哈希表:")
print(ht)

# 查找
print(f"\n查找 'banana': {ht.get('banana')}")

# 冲突演示
print("\n" + "=" * 60)
print("冲突演示:")
ht2 = HashTableChaining(size=3)
ht2.put("key1", 1)
ht2.put("key2", 2)
ht2.put("key3", 3)  # 可能冲突
ht2.put("key4", 4)  # 可能冲突
print("\n最终哈希表:")
print(ht2)

# 输出：
# ============================================================
# 链地址法测试
# ============================================================
# 插入: key='apple', value=5, index=2, 桶大小=1
# 插入: key='banana', value=3, index=1, 桶大小=1
# 插入: key='cherry', value=8, index=4, 桶大小=1
# 插入: key='date', value=2, index=2, 桶大小=2
# 插入: key='elderberry', value=6, index=1, 桶大小=2
#
# 当前哈希表:
# [1]: [('banana', 3), ('elderberry', 6)]
# [2]: [('apple', 5), ('date', 2)]
# [4]: [('cherry', 8)]
#
# 查找: key='banana', index=1, value=3
#
# ============================================================
# 冲突演示:
# 插入: key='key1', value=1, index=1, 桶大小=1
# 插入: key='key2', value=2, index=2, 桶大小=1
# 插入: key='key3', value=3, index=0, 桶大小=1
# 插入: key='key4', value=4, index=1, 桶大小=2  ← 冲突！
#
# 最终哈希表:
# [0]: [('key3', 3)]
# [1]: [('key1', 1), ('key4', 4)]
# [2]: [('key2', 2)]

链地址法图解：

Text Only

哈希表（size=5）:

索引0: []
索引1: [('banana',3)] [('elderberry',6)]
       第1个元素           第2个元素（链表）

索引2: [('apple',5)] [('date',2)]
索引3: []
索引4: [('cherry',8)]

查找'elderberry':
1. hash('elderberry') = 1
2. 去索引1的链表查找
3. 遍历链表：banana → elderberry ✓ 找到！

方法2：开放寻址法（Open Addressing）⭐¶

原理：如果发生冲突，就寻找下一个可用的位置。

三种探测方式：

A. 线性探测（Linear Probing）¶

Python

class HashTableLinearProbing:
    """
    线性探测法
    如果位置被占用，就检查下一个位置：index+1, index+2, ...
    """

    def __init__(self, size=10):
        self.size = size
        self.keys = [None] * size
        self.values = [None] * size
        self.count = 0

    def _hash(self, key):
        return hash(key) % self.size

    def _probe(self, index):
        """
        线性探测：找到下一个可用位置
        index, index+1, index+2, ...
        """
        while self.keys[index] is not None:
            index = (index + 1) % self.size
        return index

    def put(self, key, value):
        if self.count == self.size:
            raise Exception("哈希表已满")

        index = self._hash(key)
        original_index = index

        # 线性探测可用位置
        while self.keys[index] is not None:
            if self.keys[index] == key:
                # key已存在，更新
                self.values[index] = value
                print(f"更新: key='{key}', value={value}, index={index}")
                return
            index = (index + 1) % self.size

            # 防止无限循环
            if index == original_index:
                raise Exception("哈希表已满")

        # 找到可用位置
        self.keys[index] = key
        self.values[index] = value
        self.count += 1
        print(f"插入: key='{key}', value={value}, index={index} (探测{index - original_index}步)")

    def get(self, key):
        index = self._hash(key)
        original_index = index

        # 线性探测
        while self.keys[index] is not None:
            if self.keys[index] == key:
                print(f"查找: key='{key}', index={index}, value={self.values[index]}")
                return self.values[index]
            index = (index + 1) % self.size

            if index == original_index:
                break

        print(f"查找: key='{key}' 未找到")
        raise KeyError(key)

# 测试
ht = HashTableLinearProbing(size=5)

print("=" * 60)
print("线性探测测试")
print("=" * 60)

ht.put("key1", 1)
ht.put("key2", 2)
ht.put("key3", 3)
ht.put("key4", 4)  # 可能冲突

print(f"\nkeys: {ht.keys}")
print(f"values: {ht.values}")

print(f"\n查找 key3: {ht.get('key3')}")

# 输出：
# ============================================================
# 线性探测测试
# ============================================================
# 插入: key='key1', value=1, index=2 (探测0步)
# 插入: key='key2', value=2, index=3 (探测0步)
# 插入: key='key3', value=3, index=4 (探测0步)
# 插入: key='key4', value=4, index=0 (探测0步)
#
# keys: ['key4', None, 'key1', 'key2', 'key3']
# values: [4, None, 1, 2, 3]
#
# 查找: key='key3', index=4, value=3

线性探测图解：

Text Only

初始: [_, _, _, _, _]

插入A (hash=2):
[_, _, A, _, _]

插入B (hash=2):
[_, _, A, B, _]  → B探测到位置3

插入C (hash=2):
[_, _, A, B, C]  → C探测到位置4

插入D (hash=2):
[D, _, A, B, C]  → D探测到位置0（循环）

查找C (hash=2):
从索引2开始：
2 → A ≠ C
3 → B ≠ C
4 → C ✓ 找到！

B. 二次探测（Quadratic Probing）¶

Python

def quadratic_probe(index, i, size):
    """
    二次探测：index, index+1², index+2², index+3², ...
    """
    return (index + i * i) % size

# 示例
# 原始索引: 2
# 第1次探测: 2 + 1² = 3
# 第2次探测: 2 + 2² = 6
# 第3次探测: 2 + 3² = 11 → 11 % 10 = 1

C. 双重哈希（Double Hashing）¶

Python

def double_hash(key1, key2, size):
    """
    双重哈希：使用两个哈希函数
    hash = (hash1(key) + i * hash2(key)) % size
    """
    hash1 = key1 % size
    hash2 = 1 + (key2 % (size - 1))  # 保证不为0
    return (hash1 + hash2) % size

方法3：再哈希（Rehashing）¶

原理：当哈希表太满时，创建一个更大的哈希表，重新插入所有元素。

Python

class HashTableRehashing:
    """
    带再哈希的哈希表
    负载因子 = 元素数量 / 桶数量
    当负载因子 > 阈值时，进行再哈希
    """

    def __init__(self, initial_size=10, load_factor=0.75):
        self.size = initial_size
        self.load_factor = load_factor
        self.buckets = [[] for _ in range(initial_size)]
        self.count = 0

    def _hash(self, key):
        return hash(key) % self.size

    def _rehash(self, new_size):
        """再哈希：扩容并重新插入所有元素"""
        print(f"\n{'='*60}")
        print(f"再哈希：{self.size} → {new_size}")
        print(f"负载因子: {self.count / self.size:.2f}")

        old_buckets = self.buckets
        self.size = new_size
        self.buckets = [[] for _ in range(new_size)]
        self.count = 0

        # 重新插入所有元素
        for bucket in old_buckets:
            for key, value in bucket:
                self.put(key, value, check_rehash=False)

        print(f"再哈希完成！新大小: {self.size}")

    def put(self, key, value, check_rehash=True):
        """插入键值对"""
        # 检查是否需要再哈希
        if check_rehash and self.count / self.size >= self.load_factor:
            self._rehash(self.size * 2)

        index = self._hash(key)
        bucket = self.buckets[index]

        # 检查key是否已存在
        for i, (k, v) in enumerate(bucket):
            if k == key:
                bucket[i] = (key, value)
                return

        # 插入
        bucket.append((key, value))
        self.count += 1

# 测试
ht = HashTableRehashing(initial_size=5, load_factor=0.75)

print("=" * 60)
print("再哈希测试")
print("=" * 60)

# 插入元素，触发再哈希
for i in range(10):
    ht.put(f"key{i}", i)
    print(f"插入key{i}后: count={ht.count}, size={ht.size}, "
          f"负载因子={ht.count/ht.size:.2f}")

# 输出：
# ============================================================
# 再哈希测试
# ============================================================
# 插入key0后: count=1, size=5, 负载因子=0.20
# 插入key1后: count=2, size=5, 负载因子=0.40
# 插入key2后: count=3, size=5, 负载因子=0.60
# 插入key3后: count=4, size=5, 负载因子=0.80
#
# ============================================================
# 再哈希：5 → 10
# 负载因子: 0.80
# 再哈希完成！新大小: 10
#
# 插入key4后: count=5, size=10, 负载因子=0.50
# ...（继续）

再哈希图解：

Text Only

原始哈希表（size=5）:
[0] [1] [2] [3] [4]
 ↓   ↓   ↓   ↓   ↓
 k1  k2  k3  k4  k5
              ↑
         负载因子 = 5/5 = 1.0 > 0.75
         需要再哈希！

再哈希过程：
1. 创建新表（size=10）
2. 重新计算所有元素的位置
3. 插入到新表

新哈希表（size=10）:
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
 ↓   ↓   ↓   ↓   ↓   ↓   ↓   ↓   ↓   ↓
k1  k3  k5  k7  k9  k2  k4  k6  k8 k10

负载因子 = 10/10 = 1.0

哈希表实现¶

Python dict内部实现¶

Python的dict使用： - 哈希函数：hash(key) - 冲突解决：开放寻址法 - 负载因子：约⅔（Python 3.6+） - 再哈希：自动扩容

Python

# Python dict演示
d = {}

print("Python dict演示:")
print("=" * 60)

# 插入
d["apple"] = 5
print(f"插入 'apple': 5")
print(f"dict: {d}")
print(f"hash('apple'): {hash('apple')}")
print()

d["banana"] = 3
print(f"插入 'banana': 3")
print(f"dict: {d}")
print(f"hash('banana'): {hash('banana')}")
print()

# 查找
print(f"查找 'apple': {d.get('apple')}")
print(f"查找 'cherry': {d.get('cherry', '不存在')}")
print()

# 删除
del d["apple"]
print(f"删除 'apple' 后: {d}")
print()

# 常用操作
print("常用操作:")
d["key1"] = 1
d["key2"] = 2
print(f"keys: {list(d.keys())}")
print(f"values: {list(d.values())}")
print(f"items: {list(d.items())}")
print(f"'key1' in dict: {'key1' in d}")
print(f"长度: {len(d)}")

# 输出：
# Python dict演示:
# ============================================================
# 插入 'apple': 5
# dict: {'apple': 5}
# hash('apple'): 3713081631934420656
#
# 插入 'banana': 3
# dict: {'apple': 5, 'banana': 3}
# hash('banana'): -5086728256087480496
#
# 查找 'apple': 5
# 查找 'cherry': 不存在
#
# 删除 'apple' 后: {'banana': 3}
#
# 常用操作:
# keys: ['key1', 'key2']
# values: [1, 2]
# items: [('key1', 1), ('key2', 2)]
# 'key1' in dict: True
# 长度: 2

哈希表操作¶

操作1：插入（Insert）¶

Python

def insert_demonstration():
    """详细演示插入过程"""
    print("=" * 60)
    print("插入操作演示（链地址法）")
    print("=" * 60)

    ht = HashTableChaining(size=5)

    items = [
        ("apple", 5),
        ("banana", 3),
        ("cherry", 8),
        ("date", 2),
    ]

    for key, value in items:
        ht.put(key, value)

    print("\n最终哈希表:")
    print(ht)

insert_demonstration()

操作2：查找（Search）¶

Python

def search_demonstration():
    """详细演示查找过程"""
    print("=" * 60)
    print("查找操作演示")
    print("=" * 60)

    ht = HashTableChaining(size=5)
    ht.put("apple", 5)
    ht.put("banana", 3)
    ht.put("cherry", 8)

    # 查找存在的key
    try:  # try/except捕获异常
        result = ht.get("banana")
    except KeyError:
        pass

    # 查找不存在的key
    try:
        result = ht.get("date")
    except KeyError:
        pass

search_demonstration()

操作3：删除（Delete）¶

Python

def delete_demonstration():
    """详细演示删除过程"""
    print("=" * 60)
    print("删除操作演示")
    print("=" * 60)

    ht = HashTableChaining(size=5)
    ht.put("apple", 5)
    ht.put("banana", 3)
    ht.put("cherry", 8)

    print("\n删除前:")
    print(ht)

    ht.remove("banana")

    print("\n删除后:")
    print(ht)

delete_demonstration()

经典面试题¶

题目1：两数之和 ⭐⭐⭐⭐⭐¶

问题：给定一个整数数组 nums 和一个目标值 target，请你在该数组中找出和为目标值的那两个整数，并返回它们的数组下标。

示例：

Text Only

输入：nums = [2, 7, 11, 15], target = 9
输出：[0, 1]
解释：因为 nums[0] + nums[1] = 2 + 7 = 9

解法1：暴力法（O(n²)）

Python

def two_sum_brute(nums, target):
    """
    暴力解法
    时间：O(n²)
    空间：O(1)
    """
    n = len(nums)
    for i in range(n):
        for j in range(i + 1, n):
            if nums[i] + nums[j] == target:
                return [i, j]
    return []

解法2：哈希表（O(n)）⭐

Python

def two_sum_hash(nums, target):
    """
    哈希表解法
    时间：O(n)
    空间：O(n)
    """
    hash_map = {}  # 存储值到索引的映射

    for i, num in enumerate(nums):
        complement = target - num

        print(f"i={i}, num={num}, complement={complement}")
        print(f"  哈希表: {hash_map}")

        if complement in hash_map:
            print(f"  ✓ 找到！{num} + {complement} = {target}")
            print(f"  返回: [{hash_map[complement]}, {i}]")
            return [hash_map[complement], i]

        hash_map[num] = i
        print(f"  存储 {num}: {i} 到哈希表")

    return []

# 测试
nums = [2, 7, 11, 15]
target = 9
print("=" * 60)
print(f"两数之和: nums = {nums}, target = {target}")
print("=" * 60)
result = two_sum_hash(nums, target)
print(f"\n最终结果: {result}")

# 输出：
# ============================================================
# 两数之和: nums = [2, 7, 11, 15], target = 9
# ============================================================
# i=0, num=2, complement=7
#   哈希表: {}
#   存储 2: 0 到哈希表
# i=1, num=7, complement=2
#   哈希表: {2: 0}
#   ✓ 找到！7 + 2 = 9
#   返回: [0, 1]
#
# 最终结果: [0, 1]

两数之和图解：

Text Only

nums = [2, 7, 11, 15], target = 9

第1步: i=0, num=2
  complement = 9 - 2 = 7
  哈希表: {}
  7不在哈希表中
  存储 {2: 0}
  哈希表: {2: 0}

第2步: i=1, num=7
  complement = 9 - 7 = 2
  哈希表: {2: 0}
  2在哈希表中！
  找到：[hash_map[2], 1] = [0, 1] ✓

返回: [0, 1]

题目2：字母异位词分组 ⭐⭐⭐⭐¶

问题：给定一个字符串数组 strs，将字母异位词组合在一起。字母异位词是由重新排列源单词的字母得到的单词。

示例：

Text Only

输入：strs = ["eat", "tea", "tan", "ate", "nat", "bat"]
输出：[["bat"], ["nat", "tan"], ["ate", "eat", "tea"]]

解法：哈希表

Python

from collections import defaultdict

def group_anagrams(strs):
    """
    字母异位词分组
    时间：O(n * k log k)，n是单词数量，k是单词长度
    空间：O(n * k)
    """
    # 哈希表：key是排序后的单词，value是异位词列表
    anagram_map = defaultdict(list)

    for word in strs:
        # 将单词排序作为key
        sorted_word = ''.join(sorted(word))
        anagram_map[sorted_word].append(word)

        print(f"word: '{word}', sorted: '{sorted_word}'")

    # 输出分组结果
    result = list(anagram_map.values())

    print("\n分组结果:")
    for i, group in enumerate(result):
        print(f"  组{i+1}: {group}")

    return result

# 测试
strs = ["eat", "tea", "tan", "ate", "nat", "bat"]
print("=" * 60)
print("字母异位词分组")
print("=" * 60)
group_anagrams(strs)

# 输出：
# ============================================================
# 字母异位词分组
# ============================================================
# word: 'eat', sorted: 'aet'
# word: 'tea', sorted: 'aet'
# word: 'tan', sorted: 'ant'
# word: 'ate', sorted: 'aet'
# word: 'nat', sorted: 'ant'
# word: 'bat', sorted: 'abt'
#
# 分组结果:
#   组1: ['eat', 'tea', 'ate']
#   组2: ['tan', 'nat']
#   组3: ['bat']

字母异位词分组图解：

Text Only

strs = ["eat", "tea", "tan", "ate", "nat", "bat"]

处理每个单词：

"eat" → 排序 "aet"
  哈希表: {"aet": ["eat"]}

"tea" → 排序 "aet"
  哈希表: {"aet": ["eat", "tea"]}

"tan" → 排序 "ant"
  哈希表: {"aet": ["eat", "tea"], "ant": ["tan"]}

"ate" → 排序 "aet"
  哈希表: {"aet": ["eat", "tea", "ate"], "ant": ["tan"]}

"nat" → 排序 "ant"
  哈希表: {"aet": ["eat", "tea", "ate"], "ant": ["tan", "nat"]}

"bat" → 排序 "abt"
  哈希表: {"aet": ["eat", "tea", "ate"], "ant": ["tan", "nat"], "abt": ["bat"]}

最终分组:
[["eat", "tea", "ate"], ["tan", "nat"], ["bat"]]

题目3：最长连续序列 ⭐⭐⭐⭐¶

问题：给定一个未排序的整数数组 nums，找出数字连续序列的最长长度。

示例：

Text Only

输入：nums = [100, 4, 200, 1, 3, 2]
输出：4
解释：最长数字连续序列是 [1, 2, 3, 4]

解法：哈希表

Python

def longest_consecutive(nums):
    """
    最长连续序列
    时间：O(n)
    空间：O(n)
    """
    if not nums:
        return 0

    # 将所有数字存入哈希表
    num_set = set(nums)
    max_length = 0

    for num in num_set:
        # 只从序列的起始数字开始计算
        if num - 1 not in num_set:
            current_num = num
            current_length = 1

            print(f"\n起始数字: {current_num}")
            print(f"  序列: [{current_num}", end="")

            # 向后查找连续数字
            while current_num + 1 in num_set:
                current_num += 1
                current_length += 1
                print(f", {current_num}", end="")

            print(f"]")
            print(f"  长度: {current_length}")

            max_length = max(max_length, current_length)

    return max_length

# 测试
nums = [100, 4, 200, 1, 3, 2]
print("=" * 60)
print(f"最长连续序列: nums = {nums}")
print("=" * 60)
result = longest_consecutive(nums)
print(f"\n最长长度: {result}")

# 输出：
# ============================================================
# 最长连续序列: nums = [100, 4, 200, 1, 3, 2]
# ============================================================
#
# 起始数字: 100
#   序列: [100]
#   长度: 1
#
# 起始数字: 4
#   序列: [4]
#   长度: 1
#
# 起始数字: 200
#   序列: [200]
#   长度: 1
#
# 起始数字: 1
#   序列: [1, 2, 3, 4]
#   长度: 4
#
# 最长长度: 4

最长连续序列图解：

Text Only

nums = [100, 4, 200, 1, 3, 2]

哈希表: {100, 4, 200, 1, 3, 2}

遍历哈希表:

100: 99不在哈希表中 → 100是起始数字
     序列: [100]
     长度: 1

4:   3不在哈希表中 → 4是起始数字
     序列: [4]
     长度: 1

200: 199不在哈希表中 → 200是起始数字
     序列: [200]
     长度: 1

1:   0不在哈希表中 → 1是起始数字
     序列: [1, 2, 3, 4]
     长度: 4 ✓ 最长

3:   2在哈希表中 → 3不是起始数字，跳过

2:   1在哈希表中 → 2不是起始数字，跳过

最长序列: [1, 2, 3, 4]
最长长度: 4

实战应用¶

应用1：LRU缓存 ⭐⭐⭐⭐⭐¶

问题：设计LRU（最近最少使用）缓存。

解法：哈希表 + 双向链表

Python

from collections import OrderedDict

class LRUCache:
    """
    LRU缓存
    时间：O(1) 所有操作
    空间：O(capacity)
    """

    def __init__(self, capacity: int):
        self.cache = OrderedDict()
        self.capacity = capacity

    def get(self, key: int) -> int:
        """获取key的值"""
        if key not in self.cache:
            print(f"GET {key}: 未找到")
            return -1

        # 访问key，移动到末尾（标记为最近使用）
        self.cache.move_to_end(key)
        value = self.cache[key]
        print(f"GET {key}: {value} (移动到末尾)")
        print(f"  缓存: {list(self.cache.items())}")
        return value

    def put(self, key: int, value: int) -> None:
        """插入或更新key"""
        if key in self.cache:
            # key已存在，更新并移动到末尾
            self.cache.move_to_end(key)
            print(f"PUT {key}: {value} (更新)")
        else:
            # key不存在，插入
            print(f"PUT {key}: {value} (插入)")
            self.cache[key] = value

        # 检查容量
        if len(self.cache) > self.capacity:
            # 删除最久未使用的项（第一个）
            removed_key, removed_value = self.cache.popitem(last=False)
            print(f"  容量超限！删除最久未使用: {removed_key}")

        print(f"  缓存: {list(self.cache.items())}")

# 测试
cache = LRUCache(capacity=2)

print("=" * 60)
print("LRU缓存测试")
print("=" * 60)

cache.put(1, 1)
cache.put(2, 2)
cache.get(1)       # 返回 1
cache.put(3, 3)     # 该操作会使得key=2作废
cache.get(2)       # 返回 -1（未找到）

# 输出：
# ============================================================
# LRU缓存测试
# ============================================================
# PUT 1: 1 (插入)
#   缓存: [(1, 1)]
# PUT 2: 2 (插入)
#   缓存: [(1, 1), (2, 2)]
# GET 1: 1 (移动到末尾)
#   缓存: [(2, 2), (1, 1)]
# PUT 3: 3 (插入)
#   容量超限！删除最久未使用: 2
#   缓存: [(1, 1), (3, 3)]
# GET 2: 未找到

LRU缓存图解：

Text Only

容量: 2

操作1: put(1, 1)
缓存: [1]
    ↑ 最近使用

操作2: put(2, 2)
缓存: [1, 2]
        ↑ 最近使用

操作3: get(1) → 返回1
缓存: [2, 1] ← 1移动到末尾
    ↑      ↑ 最近使用
   最久未使用

操作4: put(3, 3)
缓存: [1, 3] ← 删除2（最久未使用）
        ↑ 最近使用

操作5: get(2) → -1（2已被删除）

应用2：设计推特 ⭐⭐⭐⭐¶

问题：设计一个简化版的推特，支持关注用户、发布推文、获取新闻推送。

解法：哈希表 + 链表

Python

from collections import defaultdict, deque  # defaultdict带默认值的字典，避免KeyError

class Twitter:
    """
    简化版推特
    时间：
    - postTweet: O(1)
    - follow: O(1)
    - unfollow: O(1)
    - getNewsFeed: O(n * m)，n是关注的人，m是推文数
    """

    def __init__(self):
        # 用户ID到推文列表的映射
        self.tweets = defaultdict(deque)
        # 用户ID到关注列表的映射
        self.following = defaultdict(set)
        # 时间戳
        self.timestamp = 0

    def postTweet(self, userId: int, tweetId: int) -> None:
        """发布推文"""
        self.tweets[userId].appendleft((self.timestamp, tweetId))
        self.timestamp += 1
        # 只保留最近的10条推文
        if len(self.tweets[userId]) > 10:
            self.tweets[userId].pop()

        print(f"用户{userId}发布推文{tweetId}")

    def follow(self, followerId: int, followeeId: int) -> None:
        """关注"""
        if followerId != followeeId:
            self.following[followerId].add(followeeId)
            print(f"用户{followerId}关注用户{followeeId}")

    def unfollow(self, followerId: int, followeeId: int) -> None:
        """取消关注"""
        if followeeId in self.following[followerId]:
            self.following[followerId].remove(followeeId)
            print(f"用户{followerId}取消关注用户{followeeId}")

    def getNewsFeed(self, userId: int):
        """获取新闻推送"""
        # 收集自己和关注的人的推文
        tweet_list = []
        tweet_list.extend(self.tweets[userId])

        for followee in self.following[userId]:
            tweet_list.extend(self.tweets[followee])

        # 按时间戳排序
        tweet_list.sort(reverse=True)

        # 返回最近的10条推文ID
        result = [tweetId for _, tweetId in tweet_list[:10]]  # 切片操作：[start:end:step]提取子序列

        print(f"\n用户{userId}的新闻推送:")
        for i, tweetId in enumerate(result):  # enumerate同时获取索引和值
            print(f"  {i+1}. 推文{tweetId}")

        return result

# 测试
twitter = Twitter()

print("=" * 60)
print("推特系统测试")
print("=" * 60)

twitter.postTweet(1, 5)
twitter.postTweet(1, 3)
twitter.postTweet(1, 101)
twitter.postTweet(2, 13)
twitter.postTweet(2, 10)

twitter.follow(1, 2)
twitter.getNewsFeed(1)

# 输出：
# ============================================================
# 推特系统测试
# ============================================================
# 用户1发布推文5
# 用户1发布推文3
# 用户1发布推文101
# 用户2发布推文13
# 用户2发布推文10
# 用户1关注用户2
#
# 用户1的新闻推送:
#   1. 推文101
#   2. 推文10
#   3. 推文13
#   4. 推文5
#   5. 推文3

📝 总结¶

核心要点¶

✅ 哈希表特点： - 平均O(1)查找、插入、删除 - 通过哈希函数映射key到索引 - 需要处理冲突

✅ 哈希函数： - 除法哈希（常用） - 乘法哈希 - 字符串哈希 - Python内置hash()

✅ 冲突解决： - 链地址法（最常用） - 开放寻址法（线性探测、二次探测、双重哈希） - 再哈希（扩容）

✅ 应用场景： - 字典/映射 - 去重 - 计数 - 缓存 - 索引

下一步¶

继续学习： - 树基础 - 二叉树、BST、AVL树 - 图算法 - Dijkstra、Floyd、MST

继续查看树的超级详解...