Transformer测试用例¶
⚠️ 时效性说明:本章涉及前沿模型/价格/榜单等信息,可能随版本快速变化;请以论文原文、官方发布页和 API 文档为准。
测试目标: 验证Transformer模型的核心功能和性能 测试类型: 单元测试、集成测试、性能测试 涉及组件: 注意力机制、编码器、解码器、位置编码
📋 测试概述¶
测试目标¶
- 功能测试: 验证Transformer各组件的正确性
- 性能测试: 评估模型的计算效率和内存使用
- 鲁棒性测试: 测试模型在不同输入下的稳定性
- 兼容性测试: 确保代码在不同环境下的兼容性
测试环境¶
- Python版本: 3.8+
- PyTorch版本: 2.0+
- 测试框架: pytest
- 覆盖率工具: pytest-cov
🧪 测试用例列表¶
1. 注意力机制测试¶
测试用例1.1: 多头注意力输出维度¶
测试目标: 验证多头注意力输出维度正确
测试代码:
import torch
import pytest
from torch.nn import MultiheadAttention
def test_multihead_attention_output_shape():
"""测试多头注意力输出维度"""
# 参数设置
batch_size = 4
seq_length = 10
d_model = 512
num_heads = 8
# 创建多头注意力层
attention = MultiheadAttention(
embed_dim=d_model,
num_heads=num_heads,
batch_first=True
)
# 创建输入
query = torch.randn(batch_size, seq_length, d_model)
key = torch.randn(batch_size, seq_length, d_model)
value = torch.randn(batch_size, seq_length, d_model)
# 前向传播
output, _ = attention(query, key, value)
# 验证输出维度
assert output.shape == (batch_size, seq_length, d_model) # assert断言:条件False时抛出AssertionError
print(f"✓ 输出维度正确: {output.shape}")
预期结果: 输出维度为 (batch_size, seq_length, d_model)
测试用例1.2: 自注意力权重归一化¶
测试目标: 验证自注意力权重和为1
测试代码:
def test_self_attention_weights_normalization():
"""测试自注意力权重归一化"""
batch_size = 2
seq_length = 5
d_model = 128
num_heads = 4
attention = MultiheadAttention(
embed_dim=d_model,
num_heads=num_heads,
batch_first=True
)
query = torch.randn(batch_size, seq_length, d_model)
key = torch.randn(batch_size, seq_length, d_model)
value = torch.randn(batch_size, seq_length, d_model)
# 获取注意力权重
output, attention_weights = attention(
query, key, value,
need_weights=True,
average_attn_weights=False
)
# 验证权重归一化
assert torch.allclose(
attention_weights.sum(dim=-1),
torch.ones_like(attention_weights.sum(dim=-1)),
atol=1e-6
)
print("✓ 注意力权重归一化正确")
预期结果: 注意力权重在最后一个维度上的和为1
测试用例1.3: 因果掩码有效性¶
测试目标: 验证因果掩码防止未来信息泄露
测试代码:
def test_causal_mask_effectiveness():
"""测试因果掩码有效性"""
batch_size = 2
seq_length = 10
d_model = 256
num_heads = 4
attention = MultiheadAttention(
embed_dim=d_model,
num_heads=num_heads,
batch_first=True
)
query = torch.randn(batch_size, seq_length, d_model)
key = torch.randn(batch_size, seq_length, d_model)
value = torch.randn(batch_size, seq_length, d_model)
# 创建因果掩码
causal_mask = torch.triu(
torch.ones(seq_length, seq_length) * float('-inf'),
diagonal=1
)
# 应用掩码
output, attention_weights = attention(
query, key, value,
attn_mask=causal_mask,
need_weights=True,
average_attn_weights=False
)
# 验证掩码效果
# 对于位置i,只能关注位置j <= i
for i in range(seq_length):
for j in range(i + 1, seq_length):
assert attention_weights[0, :, i, j] == 0
print("✓ 因果掩码有效")
预期结果: 未来位置的注意力权重为0
2. 位置编码测试¶
测试用例2.1: 正弦位置编码维度¶
测试目标: 验证位置编码维度正确
测试代码:
import math
class PositionalEncoding(torch.nn.Module):
"""位置编码"""
def __init__(self, d_model, max_len=5000):
super().__init__() # super()调用父类方法
pe = torch.zeros(max_len, d_model)
position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1) # unsqueeze增加一个维度
div_term = torch.exp(
torch.arange(0, d_model, 2).float() *
(-math.log(10000.0) / d_model)
)
pe[:, 0::2] = torch.sin(position * div_term)
pe[:, 1::2] = torch.cos(position * div_term)
pe = pe.unsqueeze(0)
self.register_buffer('pe', pe)
def test_positional_encoding_shape():
"""测试位置编码维度"""
batch_size = 4
seq_length = 10
d_model = 512
pos_encoder = PositionalEncoding(d_model)
# 创建输入
x = torch.randn(batch_size, seq_length, d_model)
# 添加位置编码
x = x + pos_encoder.pe[:, :seq_length, :]
# 验证维度
assert x.shape == (batch_size, seq_length, d_model)
print(f"✓ 位置编码维度正确: {x.shape}")
预期结果: 输出维度为 (batch_size, seq_length, d_model)
测试用例2.2: 位置编码唯一性¶
测试目标: 验证不同位置的位置编码不同
测试代码:
def test_positional_encoding_uniqueness():
"""测试位置编码唯一性"""
d_model = 512
max_len = 100
pos_encoder = PositionalEncoding(d_model, max_len)
# 比较不同位置的编码
pe = pos_encoder.pe.squeeze(0) # (max_len, d_model)
# 验证所有位置编码都不同
for i in range(max_len):
for j in range(i + 1, max_len):
assert not torch.allclose(pe[i], pe[j])
print("✓ 不同位置的位置编码不同")
预期结果: 所有位置的位置编码都不同
3. 编码器层测试¶
测试用例3.1: 编码器层前向传播¶
测试目标: 验证编码器层前向传播正确
测试代码:
from torch.nn import TransformerEncoderLayer
def test_encoder_layer_forward():
"""测试编码器层前向传播"""
batch_size = 4
seq_length = 10
d_model = 512
nhead = 8
dim_feedforward = 2048
encoder_layer = TransformerEncoderLayer(
d_model=d_model,
nhead=nhead,
dim_feedforward=dim_feedforward,
batch_first=True
)
# 创建输入
src = torch.randn(batch_size, seq_length, d_model)
# 前向传播
output = encoder_layer(src)
# 验证输出维度
assert output.shape == (batch_size, seq_length, d_model)
print(f"✓ 编码器层输出维度正确: {output.shape}")
预期结果: 输出维度为 (batch_size, seq_length, d_model)
测试用例3.2: 编码器层残差连接¶
测试目标: 验证编码器层残差连接有效
测试代码:
def test_encoder_layer_residual_connection():
"""测试编码器层残差连接"""
batch_size = 2
seq_length = 5
d_model = 128
nhead = 4
encoder_layer = TransformerEncoderLayer(
d_model=d_model,
nhead=nhead,
batch_first=True
)
src = torch.randn(batch_size, seq_length, d_model)
# 前向传播
output = encoder_layer(src)
# 验证输出不是零向量
assert not torch.allclose(output, torch.zeros_like(output))
# 验证梯度可以传播
loss = output.sum()
loss.backward()
# 验证梯度存在
assert encoder_layer.self_attn.in_proj_weight.grad is not None
print("✓ 编码器层残差连接有效")
预期结果: 输出不为零,梯度可以传播
4. 解码器层测试¶
测试用例4.1: 解码器层前向传播¶
测试目标: 验证解码器层前向传播正确
测试代码:
from torch.nn import TransformerDecoderLayer
def test_decoder_layer_forward():
"""测试解码器层前向传播"""
batch_size = 4
seq_length = 10
d_model = 512
nhead = 8
dim_feedforward = 2048
decoder_layer = TransformerDecoderLayer(
d_model=d_model,
nhead=nhead,
dim_feedforward=dim_feedforward,
batch_first=True
)
# 创建输入
tgt = torch.randn(batch_size, seq_length, d_model)
memory = torch.randn(batch_size, seq_length, d_model)
# 前向传播
output = decoder_layer(tgt, memory)
# 验证输出维度
assert output.shape == (batch_size, seq_length, d_model)
print(f"✓ 解码器层输出维度正确: {output.shape}")
预期结果: 输出维度为 (batch_size, seq_length, d_model)
测试用例4.2: 解码器层交叉注意力¶
测试目标: 验证解码器层交叉注意力有效
测试代码:
def test_decoder_layer_cross_attention():
"""测试解码器层交叉注意力"""
batch_size = 2
seq_length = 5
d_model = 128
nhead = 4
decoder_layer = TransformerDecoderLayer(
d_model=d_model,
nhead=nhead,
batch_first=True
)
tgt = torch.randn(batch_size, seq_length, d_model)
memory = torch.randn(batch_size, seq_length, d_model)
# 前向传播
output = decoder_layer(tgt, memory)
# 验证输出受memory影响
# 使用不同的memory应该产生不同的输出
memory2 = torch.randn(batch_size, seq_length, d_model)
output2 = decoder_layer(tgt, memory2)
assert not torch.allclose(output, output2)
print("✓ 解码器层交叉注意力有效")
预期结果: 不同memory产生不同输出
5. 完整Transformer测试¶
测试用例5.1: 端到端前向传播¶
测试目标: 验证完整Transformer前向传播
测试代码:
from torch.nn import Transformer # PyTorch 内置的 Transformer 模型
def test_transformer_forward():
"""测试完整Transformer前向传播"""
batch_size = 4
src_seq_length = 10
tgt_seq_length = 8
d_model = 512
nhead = 8
num_encoder_layers = 6
num_decoder_layers = 6
transformer = Transformer(
d_model=d_model,
nhead=nhead,
num_encoder_layers=num_encoder_layers,
num_decoder_layers=num_decoder_layers,
batch_first=True
)
# 创建输入
src = torch.randn(batch_size, src_seq_length, d_model)
tgt = torch.randn(batch_size, tgt_seq_length, d_model)
# 前向传播
output = transformer(src, tgt)
# 验证输出维度
assert output.shape == (batch_size, tgt_seq_length, d_model)
print(f"✓ Transformer输出维度正确: {output.shape}")
预期结果: 输出维度为 (batch_size, tgt_seq_length, d_model)
测试用例5.2: 序列到序列任务¶
测试目标: 验证Transformer在Seq2Seq任务上的表现
测试代码:
def test_transformer_seq2seq():
"""测试Transformer Seq2Seq任务"""
batch_size = 2
src_vocab_size = 1000
tgt_vocab_size = 1000
d_model = 256
nhead = 4
num_encoder_layers = 3
num_decoder_layers = 3
max_seq_length = 20
# 创建嵌入层
src_embedding = torch.nn.Embedding(src_vocab_size, d_model)
tgt_embedding = torch.nn.Embedding(tgt_vocab_size, d_model)
# 创建Transformer
transformer = Transformer(
d_model=d_model,
nhead=nhead,
num_encoder_layers=num_encoder_layers,
num_decoder_layers=num_decoder_layers,
batch_first=True
)
# 创建输出层
output_layer = torch.nn.Linear(d_model, tgt_vocab_size)
# 创建输入
src = torch.randint(0, src_vocab_size, (batch_size, max_seq_length))
tgt = torch.randint(0, tgt_vocab_size, (batch_size, max_seq_length))
# 嵌入
src_embedded = src_embedding(src)
tgt_embedded = tgt_embedding(tgt)
# Transformer
output = transformer(src_embedded, tgt_embedded)
# 输出层
logits = output_layer(output)
# 验证输出维度
assert logits.shape == (batch_size, max_seq_length, tgt_vocab_size)
print(f"✓ Seq2Seq输出维度正确: {logits.shape}")
预期结果: 输出维度为 (batch_size, max_seq_length, tgt_vocab_size)
6. 性能测试¶
测试用例6.1: 推理速度测试¶
测试目标: 测试Transformer推理速度
测试代码:
import time
def test_inference_speed():
"""测试推理速度"""
batch_size = 8
seq_length = 128
d_model = 512
nhead = 8
num_layers = 6
transformer = Transformer(
d_model=d_model,
nhead=nhead,
num_encoder_layers=num_layers,
num_decoder_layers=num_layers,
batch_first=True
)
src = torch.randn(batch_size, seq_length, d_model)
tgt = torch.randn(batch_size, seq_length, d_model)
# 预热
for _ in range(10):
_ = transformer(src, tgt)
# 测试
num_iterations = 100
start_time = time.time()
for _ in range(num_iterations):
_ = transformer(src, tgt)
end_time = time.time()
avg_time = (end_time - start_time) / num_iterations
print(f"✓ 平均推理时间: {avg_time * 1000:.2f}ms")
print(f"✓ 吞吐量: {batch_size / avg_time:.2f} samples/s")
预期结果: 推理时间在合理范围内
测试用例6.2: 内存使用测试¶
测试目标: 测试Transformer内存使用
测试代码:
import torch
def test_memory_usage():
"""测试内存使用"""
if not torch.cuda.is_available():
print("⚠ CUDA不可用,跳过内存测试")
return
batch_size = 16
seq_length = 256
d_model = 512
nhead = 8
num_layers = 6
transformer = Transformer(
d_model=d_model,
nhead=nhead,
num_encoder_layers=num_layers,
num_decoder_layers=num_layers,
batch_first=True
).cuda()
src = torch.randn(batch_size, seq_length, d_model).cuda()
tgt = torch.randn(batch_size, seq_length, d_model).cuda()
# 记录初始内存
torch.cuda.empty_cache()
initial_memory = torch.cuda.memory_allocated()
# 前向传播
output = transformer(src, tgt)
# 记录峰值内存
peak_memory = torch.cuda.max_memory_allocated()
memory_used = (peak_memory - initial_memory) / 1024**2 # MB
print(f"✓ 内存使用: {memory_used:.2f} MB")
# 验证内存使用在合理范围内
assert memory_used < 2000 # 小于2GB
预期结果: 内存使用在合理范围内
7. 鲁棒性测试¶
测试用例7.1: 空输入处理¶
测试目标: 测试模型对空输入的处理
测试代码:
def test_empty_input_handling():
"""测试空输入处理"""
batch_size = 1
seq_length = 1
d_model = 128
nhead = 4
transformer = Transformer(
d_model=d_model,
nhead=nhead,
batch_first=True
)
# 最小长度输入
src = torch.randn(batch_size, seq_length, d_model)
tgt = torch.randn(batch_size, seq_length, d_model)
# 前向传播
output = transformer(src, tgt)
# 验证输出
assert output.shape == (batch_size, seq_length, d_model)
assert not torch.isnan(output).any() # any()任一为True则返回True
print("✓ 空输入处理正确")
预期结果: 模型能正常处理最小长度输入
测试用例7.2: 异常值输入¶
测试目标: 测试模型对异常值输入的鲁棒性
测试代码:
def test_outlier_input():
"""测试异常值输入"""
batch_size = 2
seq_length = 5
d_model = 128
nhead = 4
transformer = Transformer(
d_model=d_model,
nhead=nhead,
batch_first=True
)
# 包含异常值的输入
src = torch.randn(batch_size, seq_length, d_model)
src[0, 0, 0] = 1000 # 异常值
tgt = torch.randn(batch_size, seq_length, d_model)
# 前向传播
output = transformer(src, tgt)
# 验证输出不包含NaN或Inf
assert not torch.isnan(output).any()
assert not torch.isinf(output).any()
print("✓ 异常值输入处理正确")
预期结果: 输出不包含NaN或Inf
📊 测试执行¶
运行所有测试¶
# 运行所有测试
pytest tests/ -v
# 运行特定测试文件
pytest tests/test_transformer.py -v
# 运行特定测试用例
pytest tests/test_transformer.py::test_multihead_attention_output_shape -v
# 生成覆盖率报告
pytest tests/ --cov=transformer --cov-report=html
测试覆盖率目标¶
- 代码覆盖率: ≥ 80%
- 分支覆盖率: ≥ 70%
- 关键路径覆盖率: 100%
✅ 验证方法¶
1. 自动化验证¶
使用pytest运行所有测试用例,确保所有测试通过。
2. 手动验证¶
检查测试输出,确认: - 所有断言通过 - 输出维度正确 - 数值在合理范围内 - 无错误或警告
3. 性能基准¶
建立性能基准,确保: - 推理速度不低于基准 - 内存使用不超过限制 - 训练收敛速度合理
📝 测试报告¶
测试报告应包含:
- 测试概览
- 测试用例总数
- 通过/失败数量
-
代码覆盖率
-
详细结果
- 每个测试用例的执行结果
- 失败测试的错误信息
-
性能指标
-
问题分析
- 失败原因分析
- 改进建议
- 后续计划
🔧 测试维护¶
定期维护¶
- 每次代码更新后运行测试
- 定期更新测试用例
- 监控测试覆盖率
持续改进¶
- 添加新的测试用例
- 优化测试性能
- 改进测试工具
测试完成标准: 所有测试用例通过,代码覆盖率≥80% 推荐测试频率: 每次代码提交 测试维护周期: 每周
最后更新日期:2026-02-12 适用版本:LLM学习教程 v2026