项目2: 目标检测实战¶

难度: ⭐⭐⭐⭐ 较难时间: 15-20小时 涉及知识: 目标检测、YOLO、Faster R-CNN、mAP评估

📖 项目概述¶

项目背景¶

目标检测是计算机视觉的核心任务之一，不仅要识别图像中的物体类别，还要定位物体的位置。广泛应用于自动驾驶、安防监控、工业检测等领域。本项目将带你实现一个完整的目标检测系统。

项目目标¶

构建一个完整的目标检测系统，能够： - 检测图像中的多个目标 - 边界框回归 - 类别分类 - 实时检测 - 可视化检测结果 - 评估检测性能

技术栈¶

深度学习框架: PyTorch
检测框架: torchvision, ultralytics
数据处理: COCO, VOC格式
可视化: matplotlib, opencv
评估指标: mAP, IoU

🏗️ 项目结构¶

Text Only

object-detection/
├── data/                      # 数据目录
│   ├── images/               # 图像文件
│   ├── annotations/          # 标注文件
│   └── splits/              # 数据划分
├── models/                   # 模型目录
│   ├── __init__.py
│   ├── faster_rcnn.py       # Faster R-CNN
│   ├── yolo.py             # YOLO
│   └── ssd.py              # SSD
├── utils/                    # 工具函数
│   ├── __init__.py
│   ├── data_loader.py      # 数据加载
│   ├── augmentation.py     # 数据增强
│   ├── metrics.py          # 评估指标
│   └── visualization.py    # 可视化
├── train.py                  # 训练脚本
├── evaluate.py               # 评估脚本
├── inference.py              # 推理脚本
├── app.py                    # Web应用
├── config.py                 # 配置文件
└── requirements.txt          # 依赖文件

🎯 核心功能¶

1. 数据处理¶

数据加载: 加载COCO/VOC格式数据
数据增强: Mosaic, Mixup等
标注转换: 转换标注格式
数据划分: 训练集、验证集

2. 模型架构¶

Faster R-CNN: 两阶段检测器
YOLO: 单阶段检测器
SSD: 单阶段多框检测器
迁移学习: 使用预训练模型

3. 训练优化¶

损失函数: 分类损失+回归损失
Anchor设计: Anchor box生成
NMS: 非极大值抑制
学习率调度: 余弦退火

4. 评估分析¶

mAP: 平均精度均值
IoU: 交并比
PR曲线: 精确率-召回率曲线
可视化: 检测结果可视化

5. 实时检测¶

视频检测: 实时视频流检测
批量检测: 批量图像检测
多尺度检测: 支持不同输入尺寸
速度优化: 模型加速

💻 代码实现¶

1. 配置文件 (config.py)¶

Python

"""
目标检测配置文件
"""
import torch
from dataclasses import dataclass

@dataclass  # @dataclass自动生成__init__等方法
class Config:
    """配置类"""

    # 数据配置
    data_dir: str = "./data"
    batch_size: int = 8
    num_workers: int = 4
    image_size: int = 640

    # 模型配置
    model_type: str = "fasterrcnn_resnet50_fpn"  # fasterrcnn, yolo, ssd
    num_classes: int = 91  # COCO 91个类别ID（80个实际类别 + 背景，ID有间隔）
    pretrained: bool = True

    # 训练配置
    num_epochs: int = 50
    learning_rate: float = 0.001
    momentum: float = 0.9
    weight_decay: float = 0.0005

    # Anchor配置
    anchor_sizes: tuple = ((32,), (64,), (128,), (256,), (512,))
    aspect_ratios: tuple = ((0.5, 1.0, 2.0),) * 5

    # NMS配置
    nms_threshold: float = 0.5
    score_threshold: float = 0.05

    # 设备配置
    device: str = "cuda" if torch.cuda.is_available() else "cpu"

    # 保存配置
    checkpoint_dir: str = "./checkpoints"
    log_dir: str = "./logs"

config = Config()

2. 数据集类 (utils/data_loader.py)¶

Python

"""
目标检测数据集
"""
import torch
from torch.utils.data import Dataset
import os
import json
import cv2
import numpy as np
from PIL import Image

class COCODataset(Dataset):
    """COCO格式数据集"""

    def __init__(self, root_dir, annotation_file, transforms=None):  # __init__构造方法，创建对象时自动调用
        """
        初始化数据集

        Args:
            root_dir: 数据根目录
            annotation_file: 标注文件路径
            transforms: 数据变换
        """
        self.root_dir = root_dir
        self.transforms = transforms

        # 加载标注
        with open(annotation_file, 'r') as f:
            self.coco = json.load(f)

        # 构建图像索引
        self.images = {
            img['id']: img for img in self.coco['images']  # 列表推导式，简洁创建列表
        }

        # 构建标注索引
        self.annotations = {}
        for ann in self.coco['annotations']:
            img_id = ann['image_id']
            if img_id not in self.annotations:
                self.annotations[img_id] = []
            self.annotations[img_id].append(ann)

        # 图像ID列表
        self.img_ids = list(self.images.keys())

    def __len__(self):  # __len__定义len()的行为
        return len(self.img_ids)

    def __getitem__(self, idx):  # __getitem__定义索引访问行为
        """获取数据项"""
        img_id = self.img_ids[idx]

        # 加载图像
        img_info = self.images[img_id]
        img_path = os.path.join(self.root_dir, img_info['file_name'])
        image = Image.open(img_path).convert('RGB')  # 链式调用，连续执行多个方法

        # 加载标注
        annotations = self.annotations.get(img_id, [])

        boxes = []
        labels = []
        areas = []
        iscrowd = []

        for ann in annotations:
            # COCO格式 [x, y, width, height] → 转为 [x1, y1, x2, y2]
            x, y, w, h = ann['bbox']
            boxes.append([x, y, x + w, y + h])
            labels.append(ann['category_id'])
            areas.append(ann['area'])
            iscrowd.append(ann.get('iscrowd', 0))

        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        labels = torch.as_tensor(labels, dtype=torch.int64)
        areas = torch.as_tensor(areas, dtype=torch.float32)
        iscrowd = torch.as_tensor(iscrowd, dtype=torch.int64)

        image_id = torch.tensor([img_id])

        target = {
            'boxes': boxes,
            'labels': labels,
            'image_id': image_id,
            'area': areas,
            'iscrowd': iscrowd,
        }

        # 应用变换
        if self.transforms:
            image, target = self.transforms(image, target)

        return image, target

class DetectionTransform:
    """目标检测数据变换（同时处理图像和标注）"""
    def __init__(self, train=True):
        self.train = train

    def __call__(self, image, target):  # __call__使实例可像函数一样被调用
        from torchvision import transforms as T
        image = T.ToTensor()(image)
        if self.train:
            if torch.rand(1) < 0.5:
                image = T.functional.hflip(image)
                # 翻转边界框
                boxes = target['boxes']
                w = image.shape[-1]  # [-1]负索引取最后一个元素
                boxes[:, [0, 2]] = w - boxes[:, [2, 0]]
                target['boxes'] = boxes
        return image, target

def get_transform(train=True):
    """获取数据变换"""
    return DetectionTransform(train=train)

3. Faster R-CNN模型 (models/faster_rcnn.py)¶

Python

"""
Faster R-CNN模型
"""
import torch
import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor

def get_model_instance_segmentation(num_classes, pretrained=True):
    """
    获取Faster R-CNN模型

    Args:
        num_classes: 类别数
        pretrained: 是否使用预训练权重

    Returns:
        模型
    """
    # 加载预训练模型
    model = torchvision.models.detection.fasterrcnn_resnet50_fpn(
        weights="DEFAULT" if pretrained else None
    )

    # 获取分类器的输入特征数
    in_features = model.roi_heads.box_predictor.cls_score.in_features

    # 替换预训练的头部
    model.roi_heads.box_predictor = FastRCNNPredictor(
        in_features,
        num_classes
    )

    return model

def collate_fn(batch):
    """自定义collate函数"""
    return tuple(zip(*batch))  # zip按位置配对多个可迭代对象

def train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10):
    """训练一个epoch"""
    model.train()  # train()开启训练模式

    metric_logger = {}
    metric_logger['loss'] = 0
    metric_logger['loss_classifier'] = 0
    metric_logger['loss_box_reg'] = 0
    metric_logger['loss_objectness'] = 0
    metric_logger['loss_rpn_box_reg'] = 0

    for images, targets in data_loader:
        images = list(image.to(device) for image in images)  # .to(device)将数据移至GPU/CPU
        targets = [{k: v.to(device) for k, v in t.items()} for t in targets]  # 字典推导式，简洁创建字典

        # 前向传播
        loss_dict = model(images, targets)
        losses = sum(loss for loss in loss_dict.values())

        # 反向传播
        optimizer.zero_grad()
        losses.backward()  # 反向传播计算梯度
        optimizer.step()  # 根据梯度更新模型参数

        # 记录损失
        for k, v in loss_dict.items():
            metric_logger[k] += v.item()  # .item()将单元素张量转为Python数值
        metric_logger['loss'] += losses.item()

    # 计算平均损失
    num_batches = len(data_loader)
    for k in metric_logger:
        metric_logger[k] /= num_batches

    return metric_logger

4. 评估指标 (utils/metrics.py)¶

Python

"""
评估指标
"""
import torch
import numpy as np
from collections import defaultdict

def calculate_iou(box1, box2):
    """
    计算两个边界框的IoU

    Args:
        box1: [x1, y1, x2, y2]
        box2: [x1, y1, x2, y2]

    Returns:
        IoU值
    """
    # 计算交集区域
    x1 = max(box1[0], box2[0])
    y1 = max(box1[1], box2[1])
    x2 = min(box1[2], box2[2])
    y2 = min(box1[3], box2[3])

    intersection = max(0, x2 - x1) * max(0, y2 - y1)

    # 计算并集区域
    area1 = (box1[2] - box1[0]) * (box1[3] - box1[1])
    area2 = (box2[2] - box2[0]) * (box2[3] - box2[1])
    union = area1 + area2 - intersection

    # 计算IoU
    iou = intersection / union if union > 0 else 0

    return iou

def calculate_ap(recall, precision):
    """
    计算平均精度

    Args:
        recall: 召回率列表
        precision: 精确率列表

    Returns:
        AP值
    """
    # 添加边界点
    recall = np.concatenate(([0.], recall, [1.]))
    precision = np.concatenate(([0.], precision, [0.]))

    # 确保精确率是单调递减的
    for i in range(len(precision) - 1, 0, -1):
        precision[i - 1] = np.maximum(precision[i - 1], precision[i])

    # 计算AP
    ap = np.sum((recall[1:] - recall[:-1]) * precision[1:])

    return ap

def calculate_map(predictions, targets, iou_threshold=0.5):
    """
    计算mAP

    Args:
        predictions: 预测结果列表
        targets: 目标标注列表
        iou_threshold: IoU阈值

    Returns:
        mAP值
    """
    # 按类别组织
    class_predictions = defaultdict(list)
    class_targets = defaultdict(list)

    for pred, target in zip(predictions, targets):
        pred_boxes = pred['boxes'].cpu().numpy()
        pred_labels = pred['labels'].cpu().numpy()
        pred_scores = pred['scores'].cpu().numpy()

        target_boxes = target['boxes'].cpu().numpy()
        target_labels = target['labels'].cpu().numpy()

        # 为每个类别组织预测和目标
        for label in np.unique(np.concatenate([pred_labels, target_labels])):
            class_predictions[label].append({
                'boxes': pred_boxes[pred_labels == label],
                'scores': pred_scores[pred_labels == label],
            })
            class_targets[label].append({
                'boxes': target_boxes[target_labels == label],
            })

    # 计算每个类别的AP
    aps = []

    for label in class_predictions.keys():
        # 收集所有预测和目标
        all_pred_boxes = []
        all_pred_scores = []
        all_target_boxes = []

        for pred, target in zip(class_predictions[label], class_targets[label]):
            all_pred_boxes.extend(pred['boxes'])
            all_pred_scores.extend(pred['scores'])
            all_target_boxes.extend(target['boxes'])

        # 按分数排序
        sorted_indices = np.argsort(all_pred_scores)[::-1]
        all_pred_boxes = [all_pred_boxes[i] for i in sorted_indices]
        all_pred_scores = [all_pred_scores[i] for i in sorted_indices]

        # 计算TP和FP
        tp = np.zeros(len(all_pred_boxes))
        fp = np.zeros(len(all_pred_boxes))
        matched_targets = set()

        for i, (pred_box, pred_score) in enumerate(zip(all_pred_boxes, all_pred_scores)):  # enumerate同时获取索引和元素
            best_iou = 0
            best_target_idx = -1

            for j, target_box in enumerate(all_target_boxes):
                if j in matched_targets:
                    continue

                iou = calculate_iou(pred_box, target_box)
                if iou > best_iou:
                    best_iou = iou
                    best_target_idx = j

            if best_iou >= iou_threshold:
                tp[i] = 1
                matched_targets.add(best_target_idx)
            else:
                fp[i] = 1

        # 计算精确率和召回率
        tp_cumsum = np.cumsum(tp)
        fp_cumsum = np.cumsum(fp)
        recall = tp_cumsum / len(all_target_boxes)
        precision = tp_cumsum / (tp_cumsum + fp_cumsum + 1e-6)

        # 计算AP
        ap = calculate_ap(recall, precision)
        aps.append(ap)

    # 计算mAP
    map_value = np.mean(aps) if aps else 0

    return map_value

5. 可视化工具 (utils/visualization.py)¶

Python

"""
可视化工具
"""
import cv2
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle

def visualize_detections(
    image,
    boxes,
    labels,
    scores=None,
    class_names=None,
    threshold=0.5,
):
    """
    可视化检测结果

    Args:
        image: 图像 (numpy array)
        boxes: 边界框列表 [N, 4]
        labels: 标签列表 [N]
        scores: 置信度列表 [N]
        class_names: 类别名称字典
        threshold: 置信度阈值

    Returns:
        可视化图像
    """
    image = image.copy()

    for i, (box, label) in enumerate(zip(boxes, labels)):
        # 检查置信度
        if scores is not None and scores[i] < threshold:
            continue

        # 获取边界框坐标
        x1, y1, x2, y2 = box.astype(int)

        # 绘制边界框
        color = np.random.randint(0, 255, 3).tolist()
        cv2.rectangle(image, (x1, y1), (x2, y2), color, 2)

        # 准备标签文本
        if class_names is not None:
            label_text = class_names.get(label, str(label))
        else:
            label_text = str(label)

        if scores is not None:
            label_text += f": {scores[i]:.2f}"

        # 绘制标签
        cv2.putText(
            image,
            label_text,
            (x1, y1 - 10),
            cv2.FONT_HERSHEY_SIMPLEX,
            0.5,
            color,
            2,
        )

    return image

def plot_detections(
    image,
    boxes,
    labels,
    scores=None,
    class_names=None,
    threshold=0.5,
):
    """
    使用matplotlib绘制检测结果

    Args:
        image: 图像 (PIL Image or numpy array)
        boxes: 边界框列表 [N, 4]
        labels: 标签列表 [N]
        scores: 置信度列表 [N]
        class_names: 类别名称字典
        threshold: 置信度阈值
    """
    fig, ax = plt.subplots(1, figsize=(12, 8))

    # 显示图像
    ax.imshow(image)

    # 绘制边界框
    for i, (box, label) in enumerate(zip(boxes, labels)):
        # 检查置信度
        if scores is not None and scores[i] < threshold:
            continue

        # 获取边界框坐标
        x1, y1, x2, y2 = box

        # 绘制边界框
        rect = Rectangle(
            (x1, y1),
            x2 - x1,
            y2 - y1,
            linewidth=2,
            edgecolor='r',
            facecolor='none',
        )
        ax.add_patch(rect)

        # 准备标签文本
        if class_names is not None:
            label_text = class_names.get(label, str(label))
        else:
            label_text = str(label)

        if scores is not None:
            label_text += f": {scores[i]:.2f}"

        # 绘制标签
        ax.text(
            x1,
            y1 - 10,
            label_text,
            bbox=dict(facecolor='red', alpha=0.5),
            fontsize=10,
            color='white',
        )

    ax.axis('off')
    plt.tight_layout()
    plt.show()

6. Streamlit应用 (app.py)¶

Python

"""
目标检测Web应用
"""
import streamlit as st
import torch
import cv2
import numpy as np
from PIL import Image
import torchvision.transforms as T

from config import config
from models.faster_rcnn import get_model_instance_segmentation
from utils.visualization import visualize_detections

# 页面配置
st.set_page_config(
    page_title="目标检测系统",
    page_icon="🎯",
    layout="wide"
)

# 标题
st.title("🎯 目标检测系统")
st.markdown("---")

# 侧边栏
st.sidebar.header("模型设置")

# 模型选择
model_type = st.sidebar.selectbox(
    "选择模型",
    ["fasterrcnn_resnet50_fpn"],
)

# 置信度阈值
score_threshold = st.sidebar.slider(
    "置信度阈值",
    min_value=0.0,
    max_value=1.0,
    value=0.5,
    step=0.05,
)

# 设备选择
device = "cuda" if torch.cuda.is_available() else "cpu"
st.sidebar.text(f"设备: {device}")

# 加载模型
@st.cache_resource
def load_model(num_classes):
    """加载模型"""
    model = get_model_instance_segmentation(num_classes, pretrained=True)
    model = model.to(device)
    model.eval()  # eval()开启评估模式（关闭Dropout等）
    return model

# COCO类别名称
COCO_CLASSES = [
    '__background__', 'person', 'bicycle', 'car', 'motorcycle',
    'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
    'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird',
    'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear',
    'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie',
    'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
    'kite', 'baseball bat', 'baseball glove', 'skateboard',
    'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup',
    'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
    'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog',
    'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant',
    'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse',
    'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
    'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase',
    'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]

# 加载模型
model = load_model(config.num_classes)

# 主界面
col1, col2 = st.columns(2)

with col1:
    st.subheader("上传图像")

    # 上传图像
    uploaded_file = st.file_uploader(
        "选择图像",
        type=['jpg', 'jpeg', 'png'],
    )

    if uploaded_file is not None:
        # 读取图像
        image = Image.open(uploaded_file).convert('RGB')
        image_np = np.array(image)  # np.array创建NumPy数组

        # 预处理
        transform = T.Compose([T.ToTensor()])
        image_tensor = transform(image).unsqueeze(0).to(device)  # unsqueeze增加一个维度

        # 检测
        with st.spinner("正在检测..."):
            with torch.no_grad():  # 禁用梯度计算，节省内存
                predictions = model(image_tensor)[0]

            # 提取结果
            boxes = predictions['boxes'].cpu().numpy()
            labels = predictions['labels'].cpu().numpy()
            scores = predictions['scores'].cpu().numpy()

            # 过滤低置信度检测
            mask = scores >= score_threshold
            boxes = boxes[mask]
            labels = labels[mask]
            scores = scores[mask]

            # 可视化
            result_image = visualize_detections(
                image_np,
                boxes,
                labels,
                scores,
                COCO_CLASSES,
                score_threshold,
            )

        # 显示结果
        with col2:
            st.subheader("检测结果")
            st.image(result_image, caption="检测结果", use_container_width=True)

            # 显示检测统计
            st.write(f"检测到 {len(boxes)} 个目标")

            # 显示详细信息
            if len(boxes) > 0:
                st.write("检测详情:")
                detection_data = []
                for i, (box, label, score) in enumerate(zip(boxes, labels, scores)):
                    class_name = COCO_CLASSES[label]
                    detection_data.append({
                        "序号": i + 1,
                        "类别": class_name,
                        "置信度": f"{score:.2%}",
                        "边界框": f"[{box[0]:.0f}, {box[1]:.0f}, {box[2]:.0f}, {box[3]:.0f}]"
                    })

                st.dataframe(
                    detection_data,
                    use_container_width=True,
                    hide_index=True
                )

🧪 测试方法¶

1. 单元测试¶

Python

"""
单元测试示例
"""
import pytest
import torch
from models.faster_rcnn import get_model_instance_segmentation
from utils.metrics import calculate_iou

def test_model_forward():
    """测试模型前向传播"""
    model = get_model_instance_segmentation(num_classes=91)
    model.eval()

    # 创建测试输入
    images = [torch.randn(3, 800, 800)]

    # 前向传播
    with torch.no_grad():
        predictions = model(images)

    # 验证输出
    assert len(predictions) == 1
    assert 'boxes' in predictions[0]
    assert 'labels' in predictions[0]
    assert 'scores' in predictions[0]

    print("✓ 模型前向传播测试通过")

def test_iou_calculation():
    """测试IoU计算"""
    box1 = [0, 0, 100, 100]
    box2 = [50, 50, 150, 150]

    iou = calculate_iou(box1, box2)

    assert 0 < iou < 1
    print(f"✓ IoU计算测试通过 (IoU: {iou:.4f})")

2. 集成测试¶

Python

"""
集成测试示例
"""
def test_detection_pipeline():
    """测试检测流程"""
    from config import config
    from utils.data_loader import COCODataset, get_transform

    # 创建数据集
    dataset = COCODataset(
        config.data_dir,
        "./data/annotations/instances_val2017.json",
        get_transform(train=False)
    )

    # 创建模型
    model = get_model_instance_segmentation(config.num_classes)
    model = model.to(config.device)
    model.eval()

    # 测试一个样本
    image, target = dataset[0]
    image = image.to(config.device)

    # 检测
    with torch.no_grad():
        prediction = model([image])[0]

    # 验证结果
    assert 'boxes' in prediction
    assert 'labels' in prediction
    assert 'scores' in prediction

    print("✓ 检测流程测试通过")

📊 扩展建议¶

1. 功能扩展¶

实例分割: Mask R-CNN
关键点检测: Keypoint R-CNN
全景分割: Panoptic FPN
视频目标检测: 跟踪算法

2. 性能优化¶

模型压缩: 量化、剪枝
TensorRT加速: 推理加速
ONNX导出: 跨平台部署
多尺度训练: 提升小目标检测

3. 部署优化¶

实时视频流: 摄像头实时检测
移动端部署: ONNX Runtime
边缘设备: Jetson Nano部署
云服务: AWS/GCP部署

📚 学习收获¶

完成本项目后，你将掌握：

✅ 目标检测原理和算法
✅ Faster R-CNN/YOLO实现
✅ Anchor机制和NMS
✅ mAP评估指标
✅ 数据增强技术
✅ 模型训练和优化
✅ 完整的目标检测系统开发

🔗 参考资源¶

项目完成时间: 15-20小时 难度等级: ⭐⭐⭐⭐ 较难 推荐指数: ⭐⭐⭐⭐⭐