使用 Docker 部署 AI 模型——从模型到 API 服务

2026-05-15 ai, docker, 教程, 模型部署 0

使用 Docker 部署 AI 模型——从模型到 API 服务

作者： CaoZH
日期： 2026-05-15
本文为原创教程

2026 年，AI 模型部署已经是后端开发者的必备技能。无论是开源的 LLaMA、Stable Diffusion，还是微调的自定义模型，Docker 都是最标准、最可靠的部署方式。

本文以部署一个文本分类模型为例，带你走完”模型 → API → Docker → 部署”的完整流程。

一、准备工作

# 安装 Docker 和 NVIDIA Container Toolkit（如果使用 GPU）
sudo apt install -y nvidia-container-toolkit
sudo systemctl restart docker

# 验证 GPU 可用
docker run --gpus all nvidia/cuda:12.0-base nvidia-smi

二、项目结构

ai-model-server/
├── Dockerfile
├── requirements.txt
├── app.py              # FastAPI 服务
├── model/              # 模型文件
│   └── model.pt
├── docker-compose.yml
└── .env

三、编写 API 服务

# app.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import time
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

app = FastAPI(title="AI Model API", version="1.0.0")

# 全局加载模型（避免每次请求都加载）
MODEL_PATH = "/app/model"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

try:
    tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
    model = AutoModelForSequenceClassification.from_pretrained(MODEL_PATH)
    model.to(device)
    model.eval()
    logger.info(f"模型加载完成，使用设备: {device}")
except Exception as e:
    logger.error(f"模型加载失败: {e}")
    model = None
    tokenizer = None

class PredictRequest(BaseModel):
    text: str
    max_length: int = 128

class PredictResponse(BaseModel):
    label: str
    confidence: float
    processing_time_ms: float

@app.get("/health")
async def health():
    return {
        "status": "ok",
        "model_loaded": model is not None,
        "device": str(device)
    }

@app.post("/predict", response_model=PredictResponse)
async def predict(request: PredictRequest):
    if model is None:
        raise HTTPException(status_code=503, detail="模型未加载")

    start = time.time()

    inputs = tokenizer(
        request.text,
        return_tensors="pt",
        truncation=True,
        max_length=request.max_length,
        padding=True
    ).to(device)

    with torch.no_grad():
        outputs = model(**inputs)
        probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
        confidence, predicted = torch.max(probabilities, dim=-1)

    # 标签映射（根据你的模型调整）
    labels = ["negative", "neutral", "positive"]

    processing_time = (time.time() - start) * 1000

    return PredictResponse(
        label=labels[predicted.item()],
        confidence=confidence.item(),
        processing_time_ms=round(processing_time, 2)
    )

# requirements.txt
fastapi==0.110.0
uvicorn[standard]==0.27.0
torch>=2.0.0
transformers>=4.35.0
pydantic>=2.0.0

四、编写 Dockerfile

# 多阶段构建

# 阶段一：安装依赖
FROM python:3.11-slim AS builder

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# 阶段二：运行
FROM python:3.11-slim

WORKDIR /app

# 安装运行时依赖
RUN apt-get update && apt-get install -y \
    libgomp1 \
    && rm -rf /var/lib/apt/lists/*

# 从构建阶段复制 Python 包
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin

# 复制应用代码
COPY app.py .
COPY model/ ./model/

# 创建非 root 用户
RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app
USER appuser

# 健康检查
HEALTHCHECK --interval=30s --timeout=5s --start-period=60s --retries=3 \
    CMD python -c "import requests; requests.get('http://localhost:8000/health')"

EXPOSE 8000

# GPU 版本
# CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

# CPU 版本（生产推荐）
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]

五、Docker Compose 配置

# docker-compose.yml
version: '3.8'

services:
  # AI 模型服务
  ai-model:
    build:
      context: .
      dockerfile: Dockerfile
    image: ai-model-server:latest
    container_name: ai-model
    ports:
      - "8000:8000"
    volumes:
      - ./model:/app/model:ro
      - model-cache:/root/.cache
    environment:
      - PYTHONUNBUFFERED=1
      - CUDA_VISIBLE_DEVICES=0    # GPU 编号
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 5

  # Nginx 反向代理
  nginx:
    image: nginx:alpine
    container_name: ai-nginx
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/conf.d/default.conf:ro
    depends_on:
      ai-model:
        condition: service_healthy
    restart: unless-stopped

volumes:
  model-cache:

# nginx.conf
server {
    listen 80;
    server_name _;

    client_max_body_size 10m;

    location / {
        proxy_pass http://ai-model:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_read_timeout 60s;
    }

    # 限制请求速率
    location /predict {
        proxy_pass http://ai-model:8000;
        limit_req zone=api burst=10 nodelay;
    }
}

六、构建与部署

# 构建镜像
docker compose build

# 启动
docker compose up -d

# 查看日志
docker compose logs -f

# 测试
curl http://localhost:8000/health
curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"text": "这个产品非常好用，我很满意！"}'

七、性能优化

# 启用批处理（提高吞吐量）
from fastapi.concurrency import run_in_threadpool

class PredictionService:
    def __init__(self):
        self.batch_size = 8
        self.queue = []
    
    async def predict_batch(self, texts: list):
        # 批量推理，GPU 利用率更高
        inputs = tokenizer(texts, return_tensors="pt", 
                          padding=True, truncation=True).to(device)
        with torch.no_grad():
            outputs = model(**inputs)
        return outputs

八、总结

## 部署 AI 模型的关键点

✅ 多阶段 Docker 构建（减小镜像体积）
✅ 非 root 用户运行
✅ 健康检查
✅ GPU 支持（nvidia-container-toolkit）
✅ 批处理提高吞吐量
✅ 反向代理 + 限流
✅ 模型缓存卷

## 推荐镜像优化
- 基础镜像：python:3.11-slim（~150MB）
- 使用 pip --no-cache-dir
- 多阶段构建分离依赖和代码
- 最终镜像：~500MB（含 PyTorch）

首发于 CaoZH 的笔记

本文链接： https://www.geniux.top/article/0b2aa907f45d/

版权声明： 本博客所有文章除特别声明外，均采用 CC BY 4.0 CN协议许可协议。转载请注明出处！

CaoZHProgrammer

学习使我快乐。