02 - 随机过程与马尔可夫链¶

学习时间: 3小时 重要性: ⭐⭐⭐⭐⭐ 理解扩散模型动态过程的关键

🎯 学习目标¶

完成本章后，你将能够： - 理解马尔可夫性质及其在扩散模型中的应用 - 掌握马尔可夫链的前向和后向过程 - 理解稳态分布的概念 - 用代码模拟马尔可夫链

1. 随机过程基础¶

1.1 什么是随机过程¶

定义：随机过程是一组随机变量的集合 $\{X_t, t \in T\}$，其中 $t$ 通常是时间或步骤。

直观理解： - 想象你在记录每天的气温，每天的温度是一个随机变量 - 一系列的温度记录就构成了一个随机过程 - 在扩散模型中，$x_0, x_1, x_2, ..., x_T$ 就是一个随机过程

1.2 扩散模型中的随机过程¶

在扩散模型中： - $x_0$：原始图像（数据分布） - $x_1, x_2, ..., x_T$：逐渐添加噪声的图像 - $x_T$：近似纯噪声

这是一个离散时间随机过程，每一步都依赖于前一步。

2. 马尔可夫链¶

2.1 马尔可夫性质¶

核心思想：未来只依赖于现在，与过去无关。

数学定义： $$P(X_{t+1} | X_t, X_{t-1}, ..., X_0) = P(X_{t+1} | X_t)$$

通俗解释： - 你现在的位置只取决于你上一步的位置 - 不需要知道你之前走过的所有路径 - 扩散模型中，$x_t$ 只依赖于 $x_{t-1}$

2.2 状态转移矩阵¶

对于离散状态的马尔可夫链，用转移矩阵 $P$ 描述状态转移概率：

\[P_{ij} = P(X_{t+1} = j | X_t = i)\]

示例：天气模型

Text Only

转移矩阵：
        S     R
    S [ 0.9   0.1 ]
    R [ 0.5   0.5 ]

含义：
- 今天晴天，明天90%概率还是晴天
- 今天雨天，明天50%概率转晴天

2.3 代码实践：简单马尔可夫链¶

Python

import numpy as np
import matplotlib.pyplot as plt

# 定义天气马尔可夫链
states = ['Sunny', 'Rainy']
P = np.array([  # np.array创建NumPy数组
    [0.9, 0.1],  # 从Sunny出发
    [0.5, 0.5]   # 从Rainy出发
])

def simulate_markov_chain(P, initial_state, n_steps):
    """
    模拟马尔可夫链

    参数:
        P: 转移矩阵
        initial_state: 初始状态索引
        n_steps: 模拟步数
    """
    n_states = P.shape[0]
    states_history = [initial_state]

    current_state = initial_state
    for _ in range(n_steps):
        # 根据当前状态的概率分布采样下一个状态
        next_state = np.random.choice(n_states, p=P[current_state])
        states_history.append(next_state)
        current_state = next_state

    return states_history

# 模拟
np.random.seed(42)
n_steps = 100
initial = 0  # 从晴天开始

history = simulate_markov_chain(P, initial, n_steps)

# 可视化
plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
plt.plot(history, 'b-', linewidth=1)
plt.yticks([0, 1], states)
plt.xlabel('Time Step')
plt.ylabel('State')
plt.title('Markov Chain Simulation')
plt.grid(True, alpha=0.3)

# 统计各状态出现频率
plt.subplot(1, 2, 2)
unique, counts = np.unique(history, return_counts=True)
plt.bar([states[i] for i in unique], counts / len(history))
plt.ylabel('Frequency')
plt.title('State Distribution')
plt.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.savefig('markov_chain_weather.png', dpi=150)
plt.show()

print(f"初始状态: {states[initial]}")
print(f"晴天频率: {counts[0]/len(history):.2%}")
print(f"雨天频率: {counts[1]/len(history):.2%}")

3. 稳态分布（Stationary Distribution）¶

3.1 直观理解¶

问题：经过很多步后，系统处于各个状态的概率分布会趋于稳定，这个稳定的分布就是稳态分布。

扩散模型中的应用： - 前向过程最终收敛到纯噪声分布（通常是标准高斯） - 这就是扩散模型的稳态分布

3.2 数学定义¶

稳态分布 $\pi$ 满足： $$\pi = \pi P$$

即分布不再随时间变化。

3.3 求解稳态分布¶

Python

# 方法1：模拟法（大数定律）
def estimate_stationary_distribution(P, n_steps=10000):
    """通过长时间模拟估计稳态分布"""
    n_states = P.shape[0]
    state_counts = np.zeros(n_states)

    current_state = 0
    for _ in range(n_steps):
        state_counts[current_state] += 1
        current_state = np.random.choice(n_states, p=P[current_state])

    return state_counts / n_steps

# 方法2：特征值法（精确解）
def compute_stationary_distribution(P):
    """通过求解特征向量计算稳态分布"""
    # 求解 πP = π，即 P^T π^T = π^T
    eigenvalues, eigenvectors = np.linalg.eig(P.T)  # np.linalg线性代数运算

    # 找到特征值为1的特征向量
    idx = np.argmin(np.abs(eigenvalues - 1))
    stationary = eigenvectors[:, idx].real

    # 归一化
    stationary = stationary / np.sum(stationary)
    return stationary

# 计算
stationary_sim = estimate_stationary_distribution(P)
stationary_exact = compute_stationary_distribution(P)

print("稳态分布:")
print(f"  模拟估计: {stationary_sim}")
print(f"  精确计算: {stationary_exact}")
print(f"  验证: πP = {stationary_exact @ P}")

4. 连续状态马尔可夫链¶

4.1 扩散模型中的连续状态¶

扩散模型处理的是连续状态空间（图像像素值是连续的）。

转移核（Transition Kernel）： $$p(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t}x_{t-1}, \beta_t I)$$

这表示：给定 $x_{t-1}$，$x_t$ 服从均值为 $\sqrt{1-\beta_t}x_{t-1}$、方差为 $\beta_t$ 的高斯分布。

4.2 前向过程的可视化¶

Python

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

def forward_diffusion_1d(x_0, n_steps, beta_schedule='linear'):
    """
    一维扩散过程模拟

    参数:
        x_0: 初始值
        n_steps: 扩散步数
        beta_schedule: beta调度策略
    """
    if beta_schedule == 'linear':
        betas = np.linspace(0.01, 0.99, n_steps)
    else:
        betas = np.ones(n_steps) * 0.1

    # 计算alpha
    alphas = 1 - betas
    alphas_cumprod = np.cumprod(alphas)

    # 存储每一步的状态
    trajectory = [x_0]
    x = x_0

    for t in range(n_steps):
        # x_t = sqrt(1-beta_t) * x_{t-1} + sqrt(beta_t) * epsilon
        epsilon = np.random.normal(0, 1)
        x = np.sqrt(1 - betas[t]) * x + np.sqrt(betas[t]) * epsilon
        trajectory.append(x)

    return np.array(trajectory), alphas_cumprod

# 模拟多个轨迹
np.random.seed(42)
n_steps = 100
n_trajectories = 10

plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
for i in range(n_trajectories):
    x_0 = np.random.normal(0, 1)
    traj, _ = forward_diffusion_1d(x_0, n_steps)
    plt.plot(traj, alpha=0.6, linewidth=1)

plt.xlabel('Time Step')
plt.ylabel('x value')
plt.title('Forward Diffusion Process (10 trajectories)')
plt.grid(True, alpha=0.3)
plt.axhline(y=0, color='r', linestyle='--', alpha=0.5, label='Mean')

# 显示分布随时间的变化
plt.subplot(1, 2, 2)
n_samples = 1000
samples_t = {0: [], 25: [], 50: [], 75: [], 99: []}

for _ in range(n_samples):
    x_0 = np.random.normal(0, 1)
    traj, alphas_cumprod = forward_diffusion_1d(x_0, n_steps)
    for t in samples_t.keys():
        samples_t[t].append(traj[t])

# 绘制不同时间步的分布
for i, (t, samples) in enumerate(samples_t.items()):  # enumerate同时获取索引和元素
    plt.hist(samples, bins=30, alpha=0.5,
             label=f't={t}, α={alphas_cumprod[t]:.3f}')

plt.xlabel('x value')
plt.ylabel('Count')
plt.title('Distribution at Different Time Steps')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('forward_diffusion_1d.png', dpi=150)
plt.show()

print("观察：")
print("1. 随着时间步增加，所有轨迹趋向于0附近")
print("2. 分布逐渐变成标准高斯分布")
print("3. 这就是马尔可夫链收敛到稳态分布的过程")

5. 马尔可夫链与扩散模型¶

5.1 扩散模型 = 马尔可夫链¶

扩散模型的前向过程就是一个马尔可夫链：

\[q(x_{0:T}) = q(x_0) \prod_{t=1}^T q(x_t | x_{t-1})\]

关键性质： 1. 马尔可夫性：$x_t$ 只依赖于 $x_{t-1}$ 2. 高斯转移：每一步都添加高斯噪声 3. 收敛性：最终收敛到标准高斯分布

5.2 边缘分布的闭合形式¶

由于马尔可夫链和高斯分布的良好性质，我们可以直接计算任意时刻的边缘分布：

\[q(x_t | x_0) = \mathcal{N}(x_t; \sqrt{\bar{\alpha}_t}x_0, (1-\bar{\alpha}_t)I)\]

其中 $\bar{\alpha}_t = \prod_{i=1}^t \alpha_i$ 是累积乘积。

重要意义： - 不需要逐步模拟，可以直接采样任意时刻 $x_t$ - 这是扩散模型训练的关键

Python

def q_sample_direct(x_0, t, alphas_cumprod):
    """
    直接从 q(x_t | x_0) 采样
    利用马尔可夫链的性质

    参数:
        x_0: 初始值
        t: 目标时间步
        alphas_cumprod: 累积alpha值
    """
    alpha_bar_t = alphas_cumprod[t]

    # 均值: sqrt(alpha_bar_t) * x_0
    mean = np.sqrt(alpha_bar_t) * x_0

    # 标准差: sqrt(1 - alpha_bar_t)
    std = np.sqrt(1 - alpha_bar_t)

    # 采样
    epsilon = np.random.normal(0, 1, x_0.shape)
    x_t = mean + std * epsilon

    return x_t, epsilon

# 验证：直接采样 vs 逐步模拟
np.random.seed(42)
n_steps = 100
betas = np.linspace(0.01, 0.99, n_steps)
alphas = 1 - betas
alphas_cumprod = np.cumprod(alphas)

x_0 = np.array([2.0])
t = 50

# 方法1：逐步模拟
x_step = x_0.copy()
for i in range(t):
    epsilon = np.random.normal(0, 1, x_0.shape)
    x_step = np.sqrt(alphas[i]) * x_step + np.sqrt(betas[i]) * epsilon

# 方法2：直接采样
np.random.seed(42)  # 重置种子，使用相同的随机数
x_direct, _ = q_sample_direct(x_0, t, alphas_cumprod)

print(f"初始值 x_0 = {x_0[0]}")
print(f"逐步模拟 x_{t} = {x_step[0]:.4f}")
print(f"直接采样 x_{t} = {x_direct[0]:.4f}")
print(f"两者相等！这证明了马尔可夫链的性质")

6. 本章总结¶

核心概念¶

随机过程
一组随时间变化的随机变量
扩散模型：$x_0, x_1, ..., x_T$
马尔可夫性质
未来只依赖于现在
$P(X_{t+1} | X_t, ..., X_0) = P(X_{t+1} | X_t)$
稳态分布
长时间后系统趋于稳定的分布
扩散模型最终收敛到标准高斯
闭合形式采样
利用马尔可夫性质，可以直接计算任意时刻的分布
$q(x_t | x_0) = \mathcal{N}(\sqrt{\bar{\alpha}_t}x_0, (1-\bar{\alpha}_t)I)$

扩散模型中的马尔可夫链¶

Text Only

前向过程（马尔可夫链）：
x_0 → x_1 → x_2 → ... → x_T
 ↓     ↓     ↓          ↓
data  noisy noisy       pure noise

关键公式：
- 转移概率: q(x_t | x_{t-1}) = N(sqrt(1-beta_t)*x_{t-1}, beta_t*I)
- 边缘分布: q(x_t | x_0) = N(sqrt(alpha_bar_t)*x_0, (1-alpha_bar_t)*I)
- 稳态分布: q(x_T) ≈ N(0, I)

📝 自测问题¶

基础问题¶

马尔可夫性质
用自己的话解释马尔可夫性质
为什么扩散模型满足马尔可夫性质？
稳态分布
什么是稳态分布？
扩散模型的稳态分布是什么？
闭合形式
为什么能够直接计算 $q(x_t | x_0)$ 而不需要逐步模拟？
这个性质在训练扩散模型时有什么优势？

编程练习¶

实现一个三状态的马尔可夫链，计算其稳态分布
可视化扩散过程中方差的变化
比较逐步模拟和直接采样的计算效率

思考题¶

如果转移概率不是高斯分布，扩散模型还能工作吗？
马尔可夫性质对反向过程有什么影响？
稳态分布的选择对扩散模型有什么影响？

🔗 下一步¶

理解了马尔可夫链后，我们将学习变分推断，这是推导扩散模型训练目标的关键数学工具。

→ 下一步：03-变分推断基础.md