第10章三维视觉¶

📚 章节概述¶

本章介绍三维视觉的核心技术，包括相机标定、立体视觉、深度估计、SLAM等。三维视觉是计算机视觉的重要方向，广泛应用于自动驾驶、AR/VR、机器人等领域。

学习时间：5-7天 难度等级：⭐⭐⭐⭐⭐ 前置知识：第1-9章、线性代数

🎯 学习目标¶

完成本章后，你将能够： - 理解相机成像模型 - 掌握相机标定技术 - 了解立体视觉和深度估计 - 理解SLAM的基本原理 - 能够实现3D视觉应用

10.1 相机标定¶

10.1.1 相机模型¶

针孔相机模型：

Text Only

s * [u, v, 1]^T = K * [R | t] * [X, Y, Z, 1]^T

其中： - K：内参矩阵 - [R | t]：外参矩阵 - [X, Y, Z]：世界坐标 - [u, v]：图像坐标

Python

import cv2
import numpy as np
import glob

def camera_calibration(images_path, chessboard_size=(9, 6)):
    """相机标定"""
    # 准备标定板角点
    objp = np.zeros((chessboard_size[0] * chessboard_size[1], 3), np.float32)
    objp[:, :2] = np.mgrid[0:chessboard_size[0], 0:chessboard_size[1]].T.reshape(-1, 2)  # 重塑张量形状

    objpoints = []  # 3D点
    imgpoints = []  # 2D点

    # 读取图像
    images = glob.glob(images_path)

    for fname in images:
        img = cv2.imread(fname)
        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

        # 寻找角点
        ret, corners = cv2.findChessboardCorners(gray, chessboard_size, None)

        if ret:
            objpoints.append(objp)
            imgpoints.append(corners)

    # 标定
    ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)

    return ret, mtx, dist, rvecs, tvecs

# 使用
ret, mtx, dist, rvecs, tvecs = camera_calibration('calibration_images/*.jpg')
print(f"内参矩阵:\n{mtx}")
print(f"畸变系数:\n{dist}")

10.1.2 畸变校正¶

Python

def undistort_image(image, mtx, dist):
    """畸变校正"""
    h, w = image.shape[:2]  # 切片操作，取前n个元素

    # 计算新的相机矩阵
    newcameramtx, roi = cv2.getOptimalNewCameraMatrix(mtx, dist, (w, h), 1, (w, h))

    # 畸变校正
    dst = cv2.undistort(image, mtx, dist, None, newcameramtx)

    # 裁剪图像
    x, y, w, h = roi
    dst = dst[y:y+h, x:x+w]

    return dst

10.2 立体视觉¶

10.2.1 立体匹配¶

Python

def stereo_matching(left_img, right_img):
    """立体匹配"""
    # 转换为灰度图
    left_gray = cv2.cvtColor(left_img, cv2.COLOR_BGR2GRAY)
    right_gray = cv2.cvtColor(right_img, cv2.COLOR_BGR2GRAY)

    # 创建立体匹配器
    stereo = cv2.StereoSGBM_create(
        minDisparity=0,
        numDisparities=16*5,  # 必须是16的倍数
        blockSize=5,
        P1=8 * 3 * 5**2,
        P2=32 * 3 * 5**2,
        disp12MaxDiff=1,
        uniquenessRatio=10,
        speckleWindowSize=100,
        speckleRange=32,
        preFilterCap=63,
        mode=cv2.STEREO_SGBM_MODE_SGBM_3WAY
    )

    # 计算视差图
    disparity = stereo.compute(left_gray, right_gray)
    disparity = disparity.astype(np.float32) / 16.0

    return disparity

# 计算深度图
def disparity_to_depth(disparity, baseline, focal_length):
    """视差转深度"""
    depth = (baseline * focal_length) / disparity
    return depth

10.2.2 深度估计¶

Python

import torch
import torch.nn as nn

class DepthEstimationNet(nn.Module):  # 继承nn.Module定义网络层
    def __init__(self):
        super(DepthEstimationNet, self).__init__()

        # 编码器
        self.encoder = nn.Sequential(
            nn.Conv2d(3, 64, 7, stride=2, padding=3),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 64, 3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2),
            # ... 更多层
        )

        # 解码器
        self.decoder = nn.Sequential(
            nn.Conv2d(512, 256, 3, padding=1),
            nn.ReLU(inplace=True),
            nn.ConvTranspose2d(256, 128, 3, stride=2, padding=1, output_padding=1),
            nn.ReLU(inplace=True),
            # ... 更多层
            nn.Conv2d(64, 1, 3, padding=1)
        )

    def forward(self, x):
        features = self.encoder(x)
        depth = self.decoder(features)
        return depth

10.3 SLAM基础¶

10.3.1 ORB-SLAM¶

核心组件： 1. 特征提取（ORB） 2. 特征匹配 3. 位姿估计 4. 优化（BA） 5. 回环检测

10.3.2 视觉里程计¶

Python

class VisualOdometry:
    def __init__(self, camera_matrix):
        self.camera_matrix = camera_matrix
        self.orb = cv2.ORB_create()
        self.prev_frame = None
        self.prev_kp = None
        self.prev_des = None
        self.trajectory = []
        self.pose = np.eye(4)

    def process_frame(self, frame):
        """处理帧"""
        # 特征提取
        kp, des = self.orb.detectAndCompute(frame, None)

        if self.prev_frame is not None:
            # 特征匹配
            bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
            matches = bf.match(self.prev_des, des)
            matches = sorted(matches, key=lambda x: x.distance)  # lambda匿名函数

            # 提取匹配点
            prev_pts = np.float32([self.prev_kp[m.queryIdx].pt for m in matches])
            curr_pts = np.float32([kp[m.trainIdx].pt for m in matches])

            # 计算本质矩阵
            E, mask = cv2.findEssentialMat(prev_pts, curr_pts, self.camera_matrix,
                                         method=cv2.RANSAC, prob=0.999, threshold=1.0)

            # 恢复位姿
            _, R, t, mask = cv2.recoverPose(E, prev_pts, curr_pts, self.camera_matrix)

            # 更新位姿
            T = np.eye(4)
            T[:3, :3] = R
            T[:3, 3] = t.ravel()
            self.pose = self.pose @ T

            # 记录轨迹
            self.trajectory.append(self.pose[:3, 3])

        # 更新前一帧
        self.prev_frame = frame
        self.prev_kp = kp
        self.prev_des = des

        return self.trajectory

10.4 点云处理¶

Python

import open3d as o3d

def visualize_point_cloud(points, colors=None):
    """可视化点云"""
    pcd = o3d.geometry.PointCloud()
    pcd.points = o3d.utility.Vector3dVector(points)

    if colors is not None:
        pcd.colors = o3d.utility.Vector3dVector(colors)

    o3d.visualization.draw_geometries([pcd])

def point_cloud_filtering(pcd, voxel_size=0.02):
    """点云滤波"""
    # 体素下采样
    downpcd = pcd.voxel_down_sample(voxel_size)

    # 统计滤波
    cl, ind = downpcd.remove_statistical_outlier(nb_neighbors=20, std_ratio=2.0)
    filtered = downpcd.select_by_index(ind)

    return filtered

def point_cloud_registration(source, target):
    """点云配准"""
    # ICP配准
    threshold = 0.02
    trans_init = np.identity(4)

    reg_p2p = o3d.pipelines.registration.registration_icp(
        source, target, threshold, trans_init,
        o3d.pipelines.registration.TransformationEstimationPointToPoint(),
        o3d.pipelines.registration.ICPConvergenceCriteria(max_iteration=2000)
    )

    return reg_p2p.transformation

10.5 练习题¶

基础题¶

简答题：
相机标定的目的是什么？

相机标定旨在确定相机的内参（焦距、主点、畸变系数）和外参（旋转矩阵、平移向量），用于校正镜头畸变、将图像坐标映射到真实世界坐标，从而实现精确的三维测量与重建。常用张正友标定法，通过拍摄多角度棋盘格图案求解参数。
立体视觉的原理是什么？

立体视觉利用两个不同视角的相机同时拍摄场景，通过立体匹配找到左右图像中对应点的视差（disparity），再根据三角测量原理计算深度：\(Z = f \cdot B / d\)（\(f\)为焦距，\(B\)为基线距离，\(d\)为视差）。核心挑战包括遮挡区域、重复纹理和弱纹理区域的匹配。

进阶题¶

编程题：
实现相机标定程序。
计算深度图并可视化。

10.6 面试准备¶

大厂面试题¶

Q1: 什么是相机标定？为什么需要标定？

参考答案： - 定义：确定相机内参和外参 - 目的： - 校正畸变 - 测量真实尺寸 - 3D重建 - 方法：张正友标定法

Q2: 立体匹配的原理是什么？

参考答案： - 利用双目视差 - 三角测量计算深度 - 深度 = (基线 * 焦距) / 视差 - 挑战：遮挡、重复纹理、弱纹理

10.7 本章小结¶

核心知识点¶

相机标定：内参、外参
立体视觉：视差、深度
SLAM：定位与建图
点云：滤波、配准

下一步¶

下一章：11-生成模型与GAN.md - 学习生成模型

恭喜完成第10章！ 🎉

第10章 三维视觉¶

📚 章节概述¶

🎯 学习目标¶

10.1 相机标定¶

10.1.1 相机模型¶

10.1.2 畸变校正¶

10.2 立体视觉¶

10.2.1 立体匹配¶

10.2.2 深度估计¶

10.3 SLAM基础¶

10.3.1 ORB-SLAM¶

10.3.2 视觉里程计¶

10.4 点云处理¶

10.5 练习题¶

基础题¶

进阶题¶

10.6 面试准备¶

大厂面试题¶

10.7 本章小结¶

核心知识点¶

下一步¶

第10章三维视觉¶