02 - 平台工程(Platform Engineering)¶
目标: 掌握平台工程理念,能构建内部开发者平台
时间: 3-4周
核心原则: 提升开发者体验,降低认知负担
🎯 什么是平台工程?¶
背景¶
Text Only
DevOps的问题:
- "谁构建,谁运维" → 开发者负担过重
- 需要掌握太多工具(K8s、Terraform、CI/CD等)
- 认知负担过重,影响业务开发
平台工程的解决方案:
- 构建内部开发者平台(IDP)
- 提供自助式服务
- 抽象复杂性,提升开发者体验
核心理念¶
📚 内部开发者平台(IDP)架构¶
平台架构¶
Text Only
┌─────────────────────────────────────────┐
│ 开发者门户(Portal) │
│ (Backstage / Port / Cortex) │
└─────────────────────────────────────────┘
│
┌───────────────────┼───────────────────┐
│ │ │
┌──────────▼────────┐ ┌─────────▼───────┐
│ 服务目录 │ │ 自助式服务 │
│ - 微服务列表 │ │ - 创建服务 │
│ - API文档 │ │ - 配置环境 │
│ - 依赖关系 │ │ - 部署应用 │
└───────────────────┘ └─────────────────┘
│
┌───────────────────▼───────────────────┐
│ 平台编排层(Orchestration) │
│ - 工作流引擎(Temporal / Cadence) │
│ - GitOps(ArgoCD / Flux) │
│ - 基础设施即代码(Terraform / Pulumi)│
└─────────────────────────────────────────┘
│
┌───────────────────▼───────────────────┐
│ 基础设施层 │
│ - Kubernetes │
│ - 云服务商(AWS/Azure/GCP) │
│ - 数据库 / 缓存 / 消息队列 │
└─────────────────────────────────────────┘
🛠️ 核心组件¶
1. 开发者门户 - Backstage¶
YAML
# app-config.yaml
app:
title: My Developer Portal
baseUrl: http://localhost:3000
backend:
baseUrl: http://localhost:7007
listen:
port: 7007
integrations:
github:
- host: github.com
token: ${GITHUB_TOKEN}
catalog:
locations:
- type: url
target: https://github.com/my-org/services/blob/main/catalog-info.yaml
proxy:
'/grafana':
target: https://grafana.mycompany.com
headers:
Authorization: Bearer ${GRAFANA_TOKEN}
techdocs:
builder: 'local'
generator:
runIn: 'docker'
publisher:
type: 'local'
TypeScript
// 创建自定义插件
// plugins/my-plugin/src/plugin.ts
import {
createPlugin,
createRouteRef,
createRoutableExtension,
} from '@backstage/core-plugin-api';
export const myPlugin = createPlugin({ // const不可重新赋值;let块级作用域变量
id: 'my-plugin',
routes: {
root: createRouteRef({ id: 'my-plugin' }),
},
});
export const MyPluginPage = myPlugin.provide(
createRoutableExtension({
name: 'MyPluginPage',
component: () => import('./components/MyPluginPage').then(m => m.MyPluginPage), // 箭头函数:简洁的函数语法
mountPoint: myPlugin.routes.root,
}),
);
// 服务目录实体定义
// catalog-info.yaml
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: user-service
description: 用户管理服务
tags:
- go
- microservice
annotations:
github.com/project-slug: my-org/user-service
grafana/dashboard-selector: "tags @> ['user-service']"
spec:
type: service
lifecycle: production
owner: user-team
system: user-system
dependsOn:
- resource:users-db
- resource:redis-cache
2. 自助式服务 - 服务模板¶
YAML
# 服务模板定义
# templates/microservice-template/template.yaml
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: microservice-template
title: Microservice Template
description: Create a new microservice with best practices
spec:
owner: platform-team
type: service
parameters:
- title: Service Information
required:
- name
- owner
properties:
name:
title: Service Name
type: string
description: Unique name of the service
owner:
title: Owner
type: string
description: Team owning this service
ui:field: OwnerPicker
- title: Technology Stack
properties:
language:
title: Programming Language
type: string
enum:
- go
- python
- nodejs
default: go
database:
title: Database
type: string
enum:
- postgresql
- mongodb
- none
default: postgresql
steps:
- id: fetch-base
name: Fetch Base Template
action: fetch:template
input:
url: ./skeleton
values:
name: ${{ parameters.name }}
owner: ${{ parameters.owner }}
language: ${{ parameters.language }}
- id: publish
name: Publish to GitHub
action: publish:github
input:
allowedHosts: ['github.com']
description: ${{ parameters.name }}
repoUrl: github.com?owner=${{ parameters.owner }}&repo=${{ parameters.name }}
- id: register
name: Register in Catalog
action: catalog:register
input:
repoContentsUrl: ${{ steps.publish.output.repoContentsUrl }}
catalogInfoPath: '/catalog-info.yaml'
output:
links:
- title: Repository
url: ${{ steps.publish.output.remoteUrl }}
- title: Open in Catalog
icon: catalog
entityRef: ${{ steps.register.output.entityRef }}
3. GitOps - ArgoCD¶
YAML
# Application定义
# argocd/application.yaml
apiVersion: argoproj.io/v1alpha1 # apiVersion指定K8s API版本
kind: Application # kind指定资源类型
metadata:
name: user-service
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec: # spec定义资源的期望状态
project: default
source:
repoURL: https://github.com/my-org/gitops-repo.git
targetRevision: HEAD
path: services/user-service/overlays/production
helm:
valueFiles:
- values-production.yaml
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
allowEmpty: false
syncOptions:
- CreateNamespace=true
- PrunePropagationPolicy=foreground
- PruneLast=true
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
4. 工作流编排 - Temporal¶
Python
# 定义工作流
from temporalio import activity, workflow
from temporalio.client import Client
from datetime import timedelta
# Activity定义
@activity.defn
async def create_github_repo(name: str, owner: str) -> str: # async def定义异步函数;用await调用
"""创建GitHub仓库"""
# 调用GitHub API
return f"https://github.com/{owner}/{name}"
@activity.defn
async def setup_ci_cd(repo_url: str) -> None:
"""设置CI/CD流水线"""
# 配置GitHub Actions
pass
@activity.defn
async def provision_infrastructure(service_name: str) -> dict:
""" provision基础设施"""
# 调用Terraform
return {"cluster": "prod-cluster", "namespace": service_name}
# Workflow定义
@workflow.defn
class ServiceProvisioningWorkflow:
@workflow.run
async def run(self, service_config: dict) -> dict:
# 1. 创建GitHub仓库
repo_url = await workflow.execute_activity( # await等待异步操作完成
create_github_repo,
args=(service_config["name"], service_config["owner"]),
start_to_close_timeout=timedelta(minutes=5),
)
# 2. 设置CI/CD
await workflow.execute_activity(
setup_ci_cd,
args=(repo_url,),
start_to_close_timeout=timedelta(minutes=10),
)
# 3. Provision基础设施
infra_info = await workflow.execute_activity(
provision_infrastructure,
args=(service_config["name"],),
start_to_close_timeout=timedelta(minutes=15),
)
return {
"repo_url": repo_url,
"infrastructure": infra_info,
"status": "provisioned",
}
# 启动工作流
async def main():
client = await Client.connect("localhost:7233")
result = await client.execute_workflow(
ServiceProvisioningWorkflow.run,
{"name": "payment-service", "owner": "payments-team"},
id="provision-payment-service",
task_queue="provisioning-queue",
)
print(f"Service provisioned: {result}")
5. 基础设施即代码 - Pulumi¶
Python
# 使用Python定义基础设施
import pulumi
from pulumi_aws import s3, ec2, eks
# 创建S3 Bucket
bucket = s3.Bucket("my-bucket",
acl="private",
versioning=s3.BucketVersioningArgs(
enabled=True,
),
tags={
"Environment": "production",
"ManagedBy": "pulumi",
},
)
# 创建VPC
vpc = ec2.Vpc("my-vpc",
cidr_block="10.0.0.0/16",
enable_dns_hostnames=True,
enable_dns_support=True,
tags={
"Name": "production-vpc",
},
)
# 创建EKS集群
cluster = eks.Cluster("my-cluster",
vpc_id=vpc.id,
subnet_ids=[subnet1.id, subnet2.id],
instance_type="t3.medium",
desired_capacity=3,
min_size=1,
max_size=5,
)
# 导出输出
pulumi.export("bucket_name", bucket.id)
pulumi.export("cluster_endpoint", cluster.endpoint)
pulumi.export("kubeconfig", cluster.kubeconfig)
🎯 黄金路径(Golden Path)¶
定义¶
示例:Web服务黄金路径¶
Markdown
# golden-paths/web-service/README.md
# Web服务黄金路径
## 技术栈
- 语言: Go 1.21+
- 框架: Gin / Echo
- 数据库: PostgreSQL 15
- 缓存: Redis 7
- 消息队列: Kafka
- 部署: Kubernetes
- 监控: Prometheus + Grafana
- 日志: ELK Stack
## 项目结构
```text
my-service/
├── cmd/
│ └── server/
│ └── main.go
├── internal/
│ ├── handlers/
│ ├── models/
│ ├── repository/
│ └── service/
├── pkg/
│ └── utils/
├── api/
│ └── openapi.yaml
├── deployments/
│ ├── docker/
│ └── k8s/
├── scripts/
├── tests/
├── Makefile
├── go.mod
└── README.md
```
## CI/CD流水线
1. 代码提交触发构建
2. 运行单元测试(覆盖率>80%)
3. 运行集成测试
4. 安全扫描(SAST/DAST)
5. 构建Docker镜像
6. 部署到开发环境
7. 手动触发部署到生产
## 监控要求
- 必须暴露 /metrics 端点
- 必须配置告警规则
- 必须配置日志收集
## 安全要求
- 必须使用依赖扫描
- 必须定期更新依赖
- 必须配置RBAC
🎯 实战:构建内部开发者平台¶
架构设计¶
Text Only
组件:
1. Backstage作为开发者门户
2. ArgoCD作为GitOps引擎
3. Temporal作为工作流引擎
4. Pulumi作为IaC工具
5. Kubernetes作为运行平台
实施步骤¶
Bash
# 1. 部署Backstage
npx @backstage/create-app@latest my-portal
cd my-portal
yarn dev
# 2. 配置服务模板
mkdir -p templates/microservice-template
# 创建template.yaml和skeleton目录
# 3. 部署ArgoCD
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
# 4. 部署Temporal
helm repo add temporal https://go.temporal.io/helm-charts
helm install temporal temporal/temporal
# 5. 配置Pulumi
pulumi new aws-python
pulumi up
✅ 学习检查点¶
- 理解平台工程的理念和价值
- 能搭建Backstage开发者门户
- 能创建服务模板
- 能配置GitOps流水线
- 能设计黄金路径
- 能使用工作流编排
📚 推荐资源¶
开源项目¶
- Backstage: 开发者门户框架
- ArgoCD: GitOps工具
- Temporal: 工作流引擎
- Pulumi: 基础设施即代码
- Crossplane: 云原生控制平面
书籍和文章¶
- 《Platform Engineering》by Luca Galante
- Gartner: Platform Engineering趋势报告
- ThoughtWorks Tech Radar
记住:平台工程的核心是开发者体验,不是技术堆砌! 🚀