第8章 图像分割¶
⚠️ 时效性说明:本章涉及的模型性能数据(如mIoU、推理速度等)可能随新版本发布而变化;请以论文原文和官方发布页为准。截至2026年2月,本章内容基于最新可获得的公开数据。
📚 章节概述¶
本章介绍图像分割的核心算法,包括语义分割、实例分割等。图像分割是像素级别的分类任务,广泛应用于医疗影像、自动驾驶、工业检测等领域。
学习时间:5-7天 难度等级:⭐⭐⭐⭐⭐ 前置知识:第5-7章
🎯 学习目标¶
完成本章后,你将能够: - 理解图像分割的任务和挑战 - 掌握FCN、U-Net等分割网络 - 了解DeepLab、PSPNet等高级方法 - 能够实现图像分割应用 - 完成图像分割项目
8.1 图像分割概述¶
8.1.1 任务定义¶
语义分割:像素级别的分类 - 同一类别的不同实例用相同颜色
实例分割:区分同一类别的不同实例 - 每个实例独立标记
8.1.2 评估指标¶
IoU (Intersection over Union):
def calculate_iou(pred, target, num_classes):
"""计算mIoU"""
ious = []
for cls in range(num_classes):
pred_mask = (pred == cls)
target_mask = (target == cls)
intersection = (pred_mask & target_mask).sum()
union = (pred_mask | target_mask).sum()
if union == 0:
iou = 1.0 if intersection == 0 else 0.0
else:
iou = intersection / union
ious.append(iou)
return np.mean(ious)
8.2 FCN (Fully Convolutional Networks)¶
核心思想: - 用卷积层代替全连接层 - 输出像素级别的预测 - 上采样恢复空间分辨率
import torch
import torch.nn as nn
class FCN(nn.Module): # 继承nn.Module定义网络层
def __init__(self, num_classes=21):
super(FCN, self).__init__()
# 特征提取(使用VGG)
self.features = nn.Sequential(
nn.Conv2d(3, 64, 3, padding=1), nn.ReLU(),
nn.MaxPool2d(2, 2),
# ... 更多层
)
# 分类层
self.score_fr = nn.Conv2d(512, num_classes, 1)
# 上采样
self.upscore = nn.ConvTranspose2d(num_classes, num_classes, 64, stride=32, bias=False)
def forward(self, x):
features = self.features(x)
score = self.score_fr(features)
upsampled = self.upscore(score)
return upsampled
8.3 U-Net¶
架构特点: - 编码器-解码器结构 - 跳跃连接(Skip Connection) - 适用于小数据集
class UNet(nn.Module):
def __init__(self, in_channels=3, out_channels=1):
super(UNet, self).__init__()
# 编码器
self.enc1 = self._block(in_channels, 64)
self.enc2 = self._block(64, 128)
self.enc3 = self._block(128, 256)
self.enc4 = self._block(256, 512)
# 瓶颈层
self.bottleneck = self._block(512, 1024)
# 解码器
self.dec4 = self._block(512 + 512, 512)
self.dec3 = self._block(256 + 256, 256)
self.dec2 = self._block(128 + 128, 128)
self.dec1 = self._block(64 + 64, 64)
# 最终卷积
self.final = nn.Conv2d(64, out_channels, 1)
# 池化
self.pool = nn.MaxPool2d(2, 2)
# 上采样(转置卷积,同时将通道数减半)
self.up4 = nn.ConvTranspose2d(1024, 512, 2, 2)
self.up3 = nn.ConvTranspose2d(512, 256, 2, 2)
self.up2 = nn.ConvTranspose2d(256, 128, 2, 2)
self.up1 = nn.ConvTranspose2d(128, 64, 2, 2)
def _block(self, in_channels, out_channels):
return nn.Sequential(
nn.Conv2d(in_channels, out_channels, 3, padding=1),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True),
nn.Conv2d(out_channels, out_channels, 3, padding=1),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True)
)
def forward(self, x):
# 编码器
enc1 = self.enc1(x)
enc2 = self.enc2(self.pool(enc1))
enc3 = self.enc3(self.pool(enc2))
enc4 = self.enc4(self.pool(enc3))
# 瓶颈层
bottleneck = self.bottleneck(self.pool(enc4))
# 解码器(转置卷积上采样 + 跳跃连接拼接)
dec4 = self.dec4(torch.cat([self.up4(bottleneck), enc4], dim=1)) # torch.cat沿已有维度拼接张量
dec3 = self.dec3(torch.cat([self.up3(dec4), enc3], dim=1))
dec2 = self.dec2(torch.cat([self.up2(dec3), enc2], dim=1))
dec1 = self.dec1(torch.cat([self.up1(dec2), enc1], dim=1))
# 输出
return self.final(dec1)
8.4 DeepLab¶
核心创新: - 空洞卷积(Atrous Convolution) - ASPP(Atrous Spatial Pyramid Pooling) - CRF后处理
class ASPP(nn.Module):
def __init__(self, in_channels, out_channels):
super(ASPP, self).__init__()
# 不同膨胀率的卷积
self.conv1 = nn.Conv2d(in_channels, out_channels, 1, bias=False)
self.conv2 = nn.Conv2d(in_channels, out_channels, 3, padding=6, dilation=6, bias=False)
self.conv3 = nn.Conv2d(in_channels, out_channels, 3, padding=12, dilation=12, bias=False)
self.conv4 = nn.Conv2d(in_channels, out_channels, 3, padding=18, dilation=18, bias=False)
# 全局平均池化
self.global_avg_pool = nn.Sequential(
nn.AdaptiveAvgPool2d(1),
nn.Conv2d(in_channels, out_channels, 1, bias=False)
)
# 融合
self.conv_out = nn.Conv2d(out_channels * 5, out_channels, 1, bias=False)
def forward(self, x):
size = x.shape[-2:]
feat1 = self.conv1(x)
feat2 = self.conv2(x)
feat3 = self.conv3(x)
feat4 = self.conv4(x)
feat5 = F.interpolate(self.global_avg_pool(x), size=size, mode='bilinear', align_corners=True) # F.xxx PyTorch函数式API
out = torch.cat([feat1, feat2, feat3, feat4, feat5], dim=1)
out = self.conv_out(out)
return out
8.5 实例分割:Mask R-CNN¶
架构: - Faster R-CNN + Mask分支
class MaskRCNN(nn.Module):
def __init__(self, num_classes):
super(MaskRCNN, self).__init__()
# Faster R-CNN backbone
self.backbone = ...
# RPN
self.rpn = ...
# ROI Align
self.roi_align = ...
# 分类和回归
self.cls_head = ...
self.bbox_head = ...
# Mask分支
self.mask_head = nn.Sequential(
nn.Conv2d(256, 256, 3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, 3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(256, num_classes, 2, 1),
nn.Sigmoid()
)
def forward(self, x):
# 特征提取
features = self.backbone(x)
# RPN
proposals = self.rpn(features)
# ROI Align
roi_features = self.roi_align(features, proposals)
# 分类和回归
cls_pred, bbox_pred = self.cls_head(roi_features), self.bbox_head(roi_features)
# Mask预测
mask_pred = self.mask_head(roi_features)
return cls_pred, bbox_pred, mask_pred
8.6 实战案例:医学图像分割¶
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
# 损失函数
class DiceLoss(nn.Module):
def __init__(self):
super(DiceLoss, self).__init__()
def forward(self, pred, target):
smooth = 1.0
pred_flat = pred.view(-1) # 重塑张量形状
target_flat = target.view(-1)
intersection = (pred_flat * target_flat).sum()
dice = (2. * intersection + smooth) / (pred_flat.sum() + target_flat.sum() + smooth)
return 1 - dice
# 训练
def train_segmentation(model, dataloader, criterion, optimizer, epochs=100):
model.train() # train()训练模式
for epoch in range(epochs):
total_loss = 0.0
for images, masks in dataloader:
# 前向传播
outputs = model(images)
loss = criterion(outputs, masks)
# 反向传播
optimizer.zero_grad() # 清零梯度
loss.backward() # 反向传播计算梯度
optimizer.step() # 更新参数
total_loss += loss.item() # 将单元素张量转为Python数值
print(f'Epoch {epoch+1}, Loss: {total_loss/len(dataloader):.4f}')
8.7 练习题¶
基础题¶
- 简答题:
- 语义分割和实例分割有什么区别?
语义分割对每个像素分配类别标签,但不区分同类的不同个体(如所有人标为"人");实例分割不仅分类每个像素,还区分同类的不同实例(如"人1""人2"),是语义分割+实例区分的结合。
- U-Net的跳跃连接有什么作用?
跳跃连接将编码器各层的特征图拼接到解码器对应层,保留了高分辨率的细节信息(如边缘、纹理),弥补了下采样过程中丢失的空间信息,使分割边界更精确。
进阶题¶
- 编程题:
- 实现一个简单的分割网络。
- 计算分割的mIoU指标。
8.8 面试准备¶
大厂面试题¶
Q1: 什么是空洞卷积?它的作用是什么?
参考答案: - 定义:在卷积核元素之间插入空洞 - 作用:扩大感受野而不增加参数 - 应用:DeepLab、多尺度特征提取
Q2: U-Net为什么适合医学图像分割?
参考答案: - 编码器-解码器结构 - 跳跃连接保留细节 - 适用于小数据集 - 端到端训练
8.9 本章小结¶
核心知识点¶
- 语义分割:像素级分类
- 实例分割:区分实例
- FCN:全卷积网络
- U-Net:跳跃连接
- DeepLab:空洞卷积、ASPP
下一步¶
下一章:09-视频分析与理解.md - 学习视频分析
恭喜完成第8章! 🎉