FastAPI 快速上手(AI工程师必备)¶
为什么AI工程师需要学FastAPI? FastAPI是部署AI模型API的最佳Python框架:异步、自动文档、类型安全、性能接近Go
一、FastAPI核心特性¶
| 特性 | 说明 |
|---|---|
| 异步 | 原生async/await支持 |
| 类型安全 | Pydantic自动验证请求/响应 |
| 自动文档 | Swagger UI + ReDoc |
| 性能 | 基于Starlette+Uvicorn,接近Node.js/Go |
二、快速起步¶
2.1 安装¶
2.2 Hello World¶
Python
from fastapi import FastAPI
app = FastAPI(title="AI Service API")
@app.get("/health")
async def health():
return {"status": "ok"}
# 运行: uvicorn main:app --reload
2.3 请求体验证(Pydantic)¶
Python
from pydantic import BaseModel, Field # Pydantic数据验证模型
# 模拟AI模型类(实际项目中替换为真实模型加载代码)
class MockAIModel:
"""模拟AI模型 - 实际使用时替换为 transformers、vLLM 等真实模型"""
async def generate(self, text: str, temperature: float = 0.7):
"""模拟生成文本"""
# 实际项目中这里会调用真实模型,例如:
# from transformers import pipeline
# generator = pipeline("text-generation", model="Qwen/Qwen-7B")
# result = generator(text, temperature=temperature)
return type('Result', (), {
'text': f"AI回复: {text[:50]}...",
'usage': {'prompt_tokens': 10, 'completion_tokens': 20}
})()
# 初始化模型(实际项目中替换为真实模型加载)
model = MockAIModel()
class PredictRequest(BaseModel):
text: str = Field(..., min_length=1, max_length=10000)
model: str = Field(default="qwen-7b")
temperature: float = Field(default=0.7, ge=0, le=2)
max_tokens: int = Field(default=512, ge=1, le=4096)
class PredictResponse(BaseModel):
text: str
usage: dict
model: str
@app.post("/predict", response_model=PredictResponse)
async def predict(req: PredictRequest):
result = await model.generate(req.text, temperature=req.temperature)
return PredictResponse(text=result.text, usage=result.usage, model=req.model)
三、AI模型服务常用模式¶
3.1 流式输出(SSE) — LLM必备¶
Python
from fastapi.responses import StreamingResponse
import asyncio
import json
# 异步生成器:每次从模型流式获取一个token就yield一条SSE格式数据(data:开头,双换行结尾),实现逐token推送给前端
async def generate_stream(prompt: str):
async for token in model.stream(prompt):
yield f"data: {json.dumps({'token': token})}\n\n" # json.dumps将Python对象转为JSON字符串
yield "data: [DONE]\n\n"
@app.post("/chat/stream")
async def chat_stream(req: ChatRequest):
return StreamingResponse(
generate_stream(req.messages),
media_type="text/event-stream"
)
3.2 文件上传(图片/PDF)¶
Python
from fastapi import UploadFile, File
from PIL import Image
import io
@app.post("/ocr")
async def ocr(file: UploadFile = File(...)):
contents = await file.read()
image = Image.open(io.BytesIO(contents))
result = ocr_model.predict(image)
return {"text": result}
3.3 模型预加载(lifespan)¶
Python
from contextlib import asynccontextmanager
@asynccontextmanager
async def lifespan(app: FastAPI):
# 启动时加载模型
app.state.model = load_model("model.onnx")
yield # yield生成器:惰性产出值,节省内存
# 关闭时清理
del app.state.model
app = FastAPI(lifespan=lifespan)
3.4 并发控制与限流¶
Python
from asyncio import Semaphore
gpu_semaphore = Semaphore(4) # 最多4个并发GPU推理
@app.post("/predict")
async def predict(req: PredictRequest):
async with gpu_semaphore:
result = await model.predict(req.text)
return result
3.5 中间件(日志/CORS/认证)¶
Python
from fastapi.middleware.cors import CORSMiddleware
# 注意:allow_origins=["*"] 与 allow_credentials=True 不能同时使用,
# 浏览器会拒绝此组合。生产环境应明确指定允许的源。
app.add_middleware(
CORSMiddleware,
allow_origins=["https://your-frontend.com"], # 生产环境指定具体域名
allow_methods=["*"],
allow_headers=["*"],
)
import time
@app.middleware("http")
async def log_requests(request, call_next): # async def定义异步函数;用await调用
start = time.time()
response = await call_next(request) # await等待异步操作完成
duration = time.time() - start
logger.info(f"{request.method} {request.url} {response.status_code} {duration:.3f}s")
return response
四、部署¶
4.1 Docker部署¶
Docker
FROM python:3.12-slim # FROM指定基础镜像
WORKDIR /app # WORKDIR设置工作目录
COPY requirements.txt . # COPY将文件复制到镜像中
RUN pip install --no-cache-dir -r requirements.txt # RUN在构建时执行命令
COPY . .
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"] # CMD容器启动时执行的默认命令
4.2 Gunicorn + Uvicorn 生产配置¶
📎 相关: Flask Web开发 | 后端架构 | MLOps
最后更新:2026年2月
