跳转至

02 - 平台工程(Platform Engineering)

目标: 掌握平台工程理念,能构建内部开发者平台

时间: 3-4周

核心原则: 提升开发者体验,降低认知负担


🎯 什么是平台工程?

背景

Text Only
DevOps的问题:
- "谁构建,谁运维" → 开发者负担过重
- 需要掌握太多工具(K8s、Terraform、CI/CD等)
- 认知负担过重,影响业务开发

平台工程的解决方案:
- 构建内部开发者平台(IDP)
- 提供自助式服务
- 抽象复杂性,提升开发者体验

核心理念

Text Only
平台工程 = DevOps + 产品思维 + 开发者体验

目标:
1. 降低开发者认知负担
2. 提供标准化、自助式服务
3. 加速软件交付
4. 保持灵活性和治理

📚 内部开发者平台(IDP)架构

平台架构

Text Only
┌─────────────────────────────────────────┐
│           开发者门户(Portal)            │
│    (Backstage / Port / Cortex)         │
└─────────────────────────────────────────┘
┌───────────────────┼───────────────────┐
│                   │                   │
┌──────────▼────────┐ ┌─────────▼───────┐
│   服务目录         │ │   自助式服务     │
│  - 微服务列表      │ │  - 创建服务      │
│  - API文档        │ │  - 配置环境      │
│  - 依赖关系        │ │  - 部署应用      │
└───────────────────┘ └─────────────────┘
┌───────────────────▼───────────────────┐
│         平台编排层(Orchestration)      │
│    - 工作流引擎(Temporal / Cadence)   │
│    - GitOps(ArgoCD / Flux)           │
│    - 基础设施即代码(Terraform / Pulumi)│
└─────────────────────────────────────────┘
┌───────────────────▼───────────────────┐
│         基础设施层                      │
│    - Kubernetes                        │
│    - 云服务商(AWS/Azure/GCP)          │
│    - 数据库 / 缓存 / 消息队列            │
└─────────────────────────────────────────┘

🛠️ 核心组件

1. 开发者门户 - Backstage

YAML
# app-config.yaml
app:
  title: My Developer Portal
  baseUrl: http://localhost:3000

backend:
  baseUrl: http://localhost:7007
  listen:
    port: 7007

integrations:
  github:
    - host: github.com
      token: ${GITHUB_TOKEN}

catalog:
  locations:
    - type: url
      target: https://github.com/my-org/services/blob/main/catalog-info.yaml

proxy:
  '/grafana':
    target: https://grafana.mycompany.com
    headers:
      Authorization: Bearer ${GRAFANA_TOKEN}

techdocs:
  builder: 'local'
  generator:
    runIn: 'docker'
  publisher:
    type: 'local'
TypeScript
// 创建自定义插件
// plugins/my-plugin/src/plugin.ts
import {
  createPlugin,
  createRouteRef,
  createRoutableExtension,
} from '@backstage/core-plugin-api';

export const myPlugin = createPlugin({  // const不可重新赋值;let块级作用域变量
  id: 'my-plugin',
  routes: {
    root: createRouteRef({ id: 'my-plugin' }),
  },
});

export const MyPluginPage = myPlugin.provide(
  createRoutableExtension({
    name: 'MyPluginPage',
    component: () => import('./components/MyPluginPage').then(m => m.MyPluginPage),  // 箭头函数:简洁的函数语法
    mountPoint: myPlugin.routes.root,
  }),
);

// 服务目录实体定义
// catalog-info.yaml
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: user-service
  description: 用户管理服务
  tags:
    - go
    - microservice
  annotations:
    github.com/project-slug: my-org/user-service
    grafana/dashboard-selector: "tags @> ['user-service']"
spec:
  type: service
  lifecycle: production
  owner: user-team
  system: user-system
  dependsOn:
    - resource:users-db
    - resource:redis-cache

2. 自助式服务 - 服务模板

YAML
# 服务模板定义
# templates/microservice-template/template.yaml
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: microservice-template
  title: Microservice Template
  description: Create a new microservice with best practices
spec:
  owner: platform-team
  type: service

  parameters:
    - title: Service Information
      required:
        - name
        - owner
      properties:
        name:
          title: Service Name
          type: string
          description: Unique name of the service
        owner:
          title: Owner
          type: string
          description: Team owning this service
          ui:field: OwnerPicker

    - title: Technology Stack
      properties:
        language:
          title: Programming Language
          type: string
          enum:
            - go
            - python
            - nodejs
          default: go
        database:
          title: Database
          type: string
          enum:
            - postgresql
            - mongodb
            - none
          default: postgresql

  steps:
    - id: fetch-base
      name: Fetch Base Template
      action: fetch:template
      input:
        url: ./skeleton
        values:
          name: ${{ parameters.name }}
          owner: ${{ parameters.owner }}
          language: ${{ parameters.language }}

    - id: publish
      name: Publish to GitHub
      action: publish:github
      input:
        allowedHosts: ['github.com']
        description: ${{ parameters.name }}
        repoUrl: github.com?owner=${{ parameters.owner }}&repo=${{ parameters.name }}

    - id: register
      name: Register in Catalog
      action: catalog:register
      input:
        repoContentsUrl: ${{ steps.publish.output.repoContentsUrl }}
        catalogInfoPath: '/catalog-info.yaml'

  output:
    links:
      - title: Repository
        url: ${{ steps.publish.output.remoteUrl }}
      - title: Open in Catalog
        icon: catalog
        entityRef: ${{ steps.register.output.entityRef }}

3. GitOps - ArgoCD

YAML
# Application定义
# argocd/application.yaml
apiVersion: argoproj.io/v1alpha1  # apiVersion指定K8s API版本
kind: Application  # kind指定资源类型
metadata:
  name: user-service
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:  # spec定义资源的期望状态
  project: default
  source:
    repoURL: https://github.com/my-org/gitops-repo.git
    targetRevision: HEAD
    path: services/user-service/overlays/production
    helm:
      valueFiles:
        - values-production.yaml
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
      allowEmpty: false
    syncOptions:
      - CreateNamespace=true
      - PrunePropagationPolicy=foreground
      - PruneLast=true
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m

4. 工作流编排 - Temporal

Python
# 定义工作流
from temporalio import activity, workflow
from temporalio.client import Client
from datetime import timedelta

# Activity定义
@activity.defn
async def create_github_repo(name: str, owner: str) -> str:  # async def定义异步函数;用await调用
    """创建GitHub仓库"""
    # 调用GitHub API
    return f"https://github.com/{owner}/{name}"

@activity.defn
async def setup_ci_cd(repo_url: str) -> None:
    """设置CI/CD流水线"""
    # 配置GitHub Actions
    pass

@activity.defn
async def provision_infrastructure(service_name: str) -> dict:
    """ provision基础设施"""
    # 调用Terraform
    return {"cluster": "prod-cluster", "namespace": service_name}

# Workflow定义
@workflow.defn
class ServiceProvisioningWorkflow:
    @workflow.run
    async def run(self, service_config: dict) -> dict:
        # 1. 创建GitHub仓库
        repo_url = await workflow.execute_activity(  # await等待异步操作完成
            create_github_repo,
            args=(service_config["name"], service_config["owner"]),
            start_to_close_timeout=timedelta(minutes=5),
        )

        # 2. 设置CI/CD
        await workflow.execute_activity(
            setup_ci_cd,
            args=(repo_url,),
            start_to_close_timeout=timedelta(minutes=10),
        )

        # 3. Provision基础设施
        infra_info = await workflow.execute_activity(
            provision_infrastructure,
            args=(service_config["name"],),
            start_to_close_timeout=timedelta(minutes=15),
        )

        return {
            "repo_url": repo_url,
            "infrastructure": infra_info,
            "status": "provisioned",
        }

# 启动工作流
async def main():
    client = await Client.connect("localhost:7233")

    result = await client.execute_workflow(
        ServiceProvisioningWorkflow.run,
        {"name": "payment-service", "owner": "payments-team"},
        id="provision-payment-service",
        task_queue="provisioning-queue",
    )

    print(f"Service provisioned: {result}")

5. 基础设施即代码 - Pulumi

Python
# 使用Python定义基础设施
import pulumi
from pulumi_aws import s3, ec2, eks

# 创建S3 Bucket
bucket = s3.Bucket("my-bucket",
    acl="private",
    versioning=s3.BucketVersioningArgs(
        enabled=True,
    ),
    tags={
        "Environment": "production",
        "ManagedBy": "pulumi",
    },
)

# 创建VPC
vpc = ec2.Vpc("my-vpc",
    cidr_block="10.0.0.0/16",
    enable_dns_hostnames=True,
    enable_dns_support=True,
    tags={
        "Name": "production-vpc",
    },
)

# 创建EKS集群
cluster = eks.Cluster("my-cluster",
    vpc_id=vpc.id,
    subnet_ids=[subnet1.id, subnet2.id],
    instance_type="t3.medium",
    desired_capacity=3,
    min_size=1,
    max_size=5,
)

# 导出输出
pulumi.export("bucket_name", bucket.id)
pulumi.export("cluster_endpoint", cluster.endpoint)
pulumi.export("kubeconfig", cluster.kubeconfig)

🎯 黄金路径(Golden Path)

定义

Text Only
黄金路径 = 经过平台团队验证的、标准化的、最佳实践的技术栈和流程

目的:
- 减少决策疲劳
- 确保质量和安全
- 加速交付
- 降低维护成本

示例:Web服务黄金路径

Markdown
# golden-paths/web-service/README.md
# Web服务黄金路径

## 技术栈
- 语言: Go 1.21+
- 框架: Gin / Echo
- 数据库: PostgreSQL 15
- 缓存: Redis 7
- 消息队列: Kafka
- 部署: Kubernetes
- 监控: Prometheus + Grafana
- 日志: ELK Stack

## 项目结构
```text
my-service/
├── cmd/
│   └── server/
│       └── main.go
├── internal/
│   ├── handlers/
│   ├── models/
│   ├── repository/
│   └── service/
├── pkg/
│   └── utils/
├── api/
│   └── openapi.yaml
├── deployments/
│   ├── docker/
│   └── k8s/
├── scripts/
├── tests/
├── Makefile
├── go.mod
└── README.md
```

## CI/CD流水线
1. 代码提交触发构建
2. 运行单元测试(覆盖率>80%)
3. 运行集成测试
4. 安全扫描(SAST/DAST)
5. 构建Docker镜像
6. 部署到开发环境
7. 手动触发部署到生产

## 监控要求
- 必须暴露 /metrics 端点
- 必须配置告警规则
- 必须配置日志收集

## 安全要求
- 必须使用依赖扫描
- 必须定期更新依赖
- 必须配置RBAC

🎯 实战:构建内部开发者平台

架构设计

Text Only
组件:
1. Backstage作为开发者门户
2. ArgoCD作为GitOps引擎
3. Temporal作为工作流引擎
4. Pulumi作为IaC工具
5. Kubernetes作为运行平台

实施步骤

Bash
# 1. 部署Backstage
npx @backstage/create-app@latest my-portal
cd my-portal
yarn dev

# 2. 配置服务模板
mkdir -p templates/microservice-template
# 创建template.yaml和skeleton目录

# 3. 部署ArgoCD
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

# 4. 部署Temporal
helm repo add temporal https://go.temporal.io/helm-charts
helm install temporal temporal/temporal

# 5. 配置Pulumi
pulumi new aws-python
pulumi up

✅ 学习检查点

  • 理解平台工程的理念和价值
  • 能搭建Backstage开发者门户
  • 能创建服务模板
  • 能配置GitOps流水线
  • 能设计黄金路径
  • 能使用工作流编排

📚 推荐资源

开源项目

  • Backstage: 开发者门户框架
  • ArgoCD: GitOps工具
  • Temporal: 工作流引擎
  • Pulumi: 基础设施即代码
  • Crossplane: 云原生控制平面

书籍和文章

  • 《Platform Engineering》by Luca Galante
  • Gartner: Platform Engineering趋势报告
  • ThoughtWorks Tech Radar

记住:平台工程的核心是开发者体验,不是技术堆砌! 🚀