GitHub Actions 自托管 Runner 与高级 CI/CD 工作流实战

GitHub Actions 自托管 Runner 与高级 CI/CD 工作流实战

简介

GitHub Actions 是 GitHub 内置的 CI/CD 平台,允许开发者直接在仓库中定义自动化工作流。本文聚焦于自托管 Runner(Self-Hosted Runner) 的部署与高级工作流模式,涵盖矩阵构建、可复用工作流、环境部署门控、缓存优化、手动审批等生产级场景。

如果你已经熟悉 GitHub Actions 的基本概念(onjobsstepsactions/checkout),本文将带你进入下一个阶段——构建真正可投入生产的 CI/CD 流水线。

前置要求

  • 一个 GitHub 仓库(公开或私有均可)
  • 一台 Linux 服务器(用于部署自托管 Runner,建议 2C4G 以上)
  • 基本的 Git 和 YAML 知识
  • 已了解 GitHub Actions 基础语法

目录

  1. 自托管 Runner 部署
  2. 矩阵构建策略
  3. 可复用工作流
  4. 环境部署门控与手动审批
  5. 高级缓存策略
  6. Workflow Dispatch 与手动触发
  7. 完整示例:多环境部署流水线
  8. 常见问题

1. 自托管 Runner 部署

1.1 为什么需要自托管 Runner

场景 GitHub 托管 Runner 自托管 Runner
构建速度 共享资源,排队 独占资源,无排队
硬件配置 标准配置(2C/7GB) 自定义(高配 GPU/大内存)
网络访问 无法访问内网 可访问内网服务
成本 免费额度有限 使用自有服务器
持久化缓存 每次重置 可持久化

1.2 在 Linux 服务器上安装 Runner

步骤 1:在 GitHub 仓库中添加 Runner

  1. 打开仓库 → SettingsActionsRunnersNew self-hosted runner
  2. 选择操作系统(Linux)
  3. 复制页面上的命令

步骤 2:下载并配置 Runner

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# 创建专用用户(推荐)
sudo useradd -m -s /bin/bash github-runner
sudo su - github-runner

# 创建 runner 目录
mkdir actions-runner && cd actions-runner

# 下载最新版 Runner(以 x64 Linux 为例)
curl -o actions-runner-linux-x64-2.322.0.tar.gz \
-L https://github.com/actions/runner/releases/download/v2.322.0/actions-runner-linux-x64-2.322.0.tar.gz

# 验证校验和
echo "ac4e0b4e3c10e2a4c3f0a8b1c2d3e4f5a6b7c8d9e0f1a2b3c4d5e6f7a8b9c0d1 actions-runner-linux-x64-2.322.0.tar.gz" | shasum -a 256 -c

# 解压
tar xzf actions-runner-linux-x64-2.322.0.tar.gz

# 配置(替换为你的仓库 token)
./config.sh --url https://github.com/你的用户名/你的仓库 \
--token AAAAAAAAAAAAAAAAAAAAAAAAAA \
--name "prod-runner-01" \
--labels "prod,linux,x64" \
--work "_work"

步骤 3:注册为系统服务

1
2
3
4
5
6
7
8
# 安装为 systemd 服务
sudo ./svc.sh install

# 启动服务
sudo ./svc.sh start

# 查看状态
sudo ./svc.sh status

步骤 4:验证 Runner 在线

1
2
3
4
# 查看服务日志
sudo ./svc.sh status
# 或查看 journalctl
sudo journalctl -u actions.runner.你的仓库.你的用户名.prod-runner-01.service -f

在 GitHub 仓库 Settings → Actions → Runners 页面,应该看到 runner 状态为 Idle

1.3 使用自托管 Runner 的工作流

1
2
3
4
5
6
7
8
9
10
11
12
13
name: Deploy to Production
on:
push:
branches: [main]

jobs:
deploy:
runs-on: [self-hosted, prod, linux] # 匹配标签
steps:
- uses: actions/checkout@v4
- name: Deploy application
run: |
./deploy.sh

1.4 Runner 安全加固

1
2
3
4
5
6
7
8
9
10
# 限制 runner 用户权限
sudo usermod -aG docker github-runner # 仅授予必要权限

# 设置工作目录隔离
# 每个 job 会获得独立的工作目录,job 结束后自动清理

# 使用 ephemeral runner(每次运行后自动注销)
./config.sh --url https://github.com/你的用户名/你的仓库 \
--token AAAAAAAAAAAAAAAAAAAAAAAAAA \
--ephemeral

安全警告:自托管 Runner 可以执行任意代码,仅允许受信任的仓库和工作流使用。建议为不同环境(dev/staging/prod)使用独立的 Runner。


2. 矩阵构建策略

矩阵构建(Matrix Strategy)让你用单一工作流定义跨多个配置的并行构建。

2.1 基础矩阵:多版本测试

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
name: Matrix Test
on: [push, pull_request]

jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
node-version: [18, 20, 22]
os: [ubuntu-latest, windows-latest]
fail-fast: false # 一个失败不取消其他

steps:
- uses: actions/checkout@v4
- name: Use Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node-version }}
- run: npm ci
- run: npm test

2.2 包含/排除特定组合

1
2
3
4
5
6
7
8
9
10
11
12
13
strategy:
matrix:
node-version: [18, 20, 22]
os: [ubuntu-latest, windows-latest, macos-latest]
include:
# 只在最新 Node 上运行 lint
- node-version: 22
os: ubuntu-latest
lint: true
exclude:
# Node 18 不需要在 macOS 上测试
- node-version: 18
os: macos-latest

2.3 动态矩阵

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
jobs:
get-matrix:
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps:
- id: set-matrix
run: |
# 动态生成矩阵配置
echo 'matrix={"include":[
{"project":"frontend","dir":"./frontend","port":3000},
{"project":"backend","dir":"./backend","port":4000},
{"project":"admin","dir":"./admin","port":5000}
]}' >> $GITHUB_OUTPUT

build:
needs: get-matrix
runs-on: ubuntu-latest
strategy:
matrix: ${{ fromJson(needs.get-matrix.outputs.matrix) }}
steps:
- uses: actions/checkout@v4
- run: |
cd ${{ matrix.dir }}
npm ci
npm run build

3. 可复用工作流

可复用工作流(Reusable Workflows)让你定义一次工作流,在多个仓库或同一仓库的不同工作流中调用。

3.1 定义可复用工作流

.github/workflows/ 目录下创建,使用 workflow_call 触发器:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# .github/workflows/reusable-build.yml
name: Reusable Build

on:
workflow_call:
inputs:
node-version:
required: true
type: string
default: '20'
build-command:
required: false
type: string
default: 'npm run build'
secrets:
NPM_TOKEN:
required: true
outputs:
build-artifact:
description: "Build output artifact name"
value: ${{ jobs.build.outputs.artifact-name }}

jobs:
build:
runs-on: ubuntu-latest
outputs:
artifact-name: build-${{ inputs.node-version }}
steps:
- uses: actions/checkout@v4
- name: Setup Node
uses: actions/setup-node@v4
with:
node-version: ${{ inputs.node-version }}
registry-url: 'https://npm.pkg.github.com'
- name: Authenticate
run: |
echo "//npm.pkg.github.com/:_authToken=${{ secrets.NPM_TOKEN }}" > .npmrc
- name: Install & Build
run: |
npm ci
${{ inputs.build-command }}
- name: Upload Artifact
uses: actions/upload-artifact@v4
with:
name: build-${{ inputs.node-version }}
path: dist/

3.2 调用可复用工作流

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# .github/workflows/ci.yml
name: CI Pipeline

on:
push:
branches: [main, develop]
pull_request:
branches: [main]

jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- run: npm ci
- run: npm run lint

build:
needs: lint
uses: ./.github/workflows/reusable-build.yml
with:
node-version: '20'
build-command: 'npm run build:production'
secrets:
NPM_TOKEN: ${{ secrets.NPM_TOKEN }}

test:
needs: build
uses: ./.github/workflows/reusable-test.yml # 另一个可复用工作流
with:
node-version: '20'

deploy:
needs: [build, test]
if: github.ref == 'refs/heads/main'
uses: ./.github/workflows/reusable-deploy.yml
with:
environment: production
secrets:
DEPLOY_KEY: ${{ secrets.DEPLOY_KEY }}

3.3 跨仓库调用

1
2
3
4
5
6
7
jobs:
call-workflow:
uses: octo-org/another-repo/.github/workflows/build.yml@v1
with:
node-version: '20'
secrets:
NPM_TOKEN: ${{ secrets.NPM_TOKEN }}

4. 环境部署门控与手动审批

GitHub Environments 提供部署门控功能,包括必需审查者、等待计时器和分支保护。

4.1 创建环境

  1. 仓库 → SettingsEnvironmentsNew environment
  2. 创建三个环境:developmentstagingproduction

Production 环境配置:

  • Required reviewers:添加 1-2 个审查者
  • Wait timer:设置 5 分钟(可选)
  • Deployment branches:限制为 main 分支

4.2 使用环境的工作流

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
name: Deploy with Gates

on:
push:
branches: [main]

jobs:
deploy-dev:
runs-on: ubuntu-latest
environment: development
steps:
- uses: actions/checkout@v4
- run: echo "Deploying to dev..."

deploy-staging:
needs: deploy-dev
runs-on: ubuntu-latest
environment: staging
steps:
- uses: actions/checkout@v4
- run: echo "Deploying to staging..."

deploy-production:
needs: deploy-staging
runs-on: [self-hosted, prod]
environment:
name: production
url: https://your-app.com
steps:
- uses: actions/checkout@v4
- name: Deploy to Production
run: |
./scripts/deploy-prod.sh
- name: Health Check
run: |
curl -f https://your-app.com/health

当工作流到达 deploy-production 阶段时,GitHub 会暂停并等待指定的审查者批准,才会继续执行。

4.3 环境变量与密钥

1
2
3
4
5
6
7
8
9
10
11
12
13
jobs:
deploy:
runs-on: ubuntu-latest
environment: production
env:
# 环境级别的变量
API_URL: ${{ vars.API_URL }}
LOG_LEVEL: ${{ vars.LOG_LEVEL }}
steps:
- name: Use environment secrets
run: |
echo "Database URL: ${{ secrets.DATABASE_URL }}"
echo "API Key: ${{ secrets.API_KEY }}"

不同环境可以有不同的变量和密钥值,在 Environments 设置页面中配置。


5. 高级缓存策略

5.1 依赖缓存

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
name: CI with Cache

on: [push]

jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Cache Node.js modules
uses: actions/cache@v4
with:
path: ~/.npm
key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.os }}-node-

- name: Cache Next.js build
uses: actions/cache@v4
with:
path: |
.next/cache
key: ${{ runner.os }}-nextjs-${{ hashFiles('**/package-lock.json') }}-${{ hashFiles('**/*.js', '**/*.jsx', '**/*.ts', '**/*.tsx') }}
restore-keys: |
${{ runner.os }}-nextjs-${{ hashFiles('**/package-lock.json') }}-

- run: npm ci
- run: npm run build

5.2 自托管 Runner 的持久化缓存优势

自托管 Runner 的磁盘是持久的,可以缓存 Docker 镜像:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
jobs:
build:
runs-on: [self-hosted, linux]
steps:
- uses: actions/checkout@v4

- name: Cache Docker images
run: |
# 检查本地是否已有基础镜像
if docker image inspect my-base:latest > /dev/null 2>&1; then
echo "Using cached base image"
else
docker pull node:20-alpine
docker build -t my-base:latest -f Dockerfile.base .
fi

- name: Build application
run: |
docker build --cache-from my-base:latest -t my-app:${{ github.sha }} .

5.3 使用仓库缓存动作(Repository Cache)

1
2
3
4
5
6
7
8
9
- name: Cache Gradle packages
uses: actions/cache@v4
with:
path: |
~/.gradle/caches
~/.gradle/wrapper
key: ${{ runner.os }}-gradle-${{ hashFiles('**/*.gradle*', '**/gradle-wrapper.properties') }}
restore-keys: |
${{ runner.os }}-gradle-

6. Workflow Dispatch 与手动触发

6.1 带参数的手动触发

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
name: Manual Deploy

on:
workflow_dispatch:
inputs:
environment:
description: '部署环境'
required: true
default: 'staging'
type: choice
options:
- development
- staging
- production
version:
description: '部署版本(Git tag 或 commit SHA)'
required: true
type: string
dry-run:
description: '仅验证,不实际部署'
required: false
default: false
type: boolean

jobs:
deploy:
runs-on: ubuntu-latest
environment: ${{ github.event.inputs.environment }}
steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.version }}

- name: Validate deployment
run: |
echo "Deploying to: ${{ github.event.inputs.environment }}"
echo "Version: ${{ github.event.inputs.version }}"
echo "Dry run: ${{ github.event.inputs.dry-run }}"

- name: Actual deploy
if: ${{ github.event.inputs.dry-run == 'false' }}
run: |
./scripts/deploy.sh ${{ github.event.inputs.environment }}

6.2 定时触发与条件执行

1
2
3
4
5
6
7
8
9
10
11
12
on:
schedule:
# 每天 UTC 2:00 运行(北京时间 10:00)
- cron: '0 2 * * *'
workflow_dispatch: # 同时支持手动触发

jobs:
nightly-build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm ci && npm run build

7. 完整示例:多环境部署流水线

这是一个完整的生产级 CI/CD 流水线,整合了本文所有高级特性:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
# .github/workflows/main.yml
name: Production CI/CD Pipeline

on:
push:
branches: [main, develop]
pull_request:
branches: [main]
workflow_dispatch:
inputs:
environment:
description: 'Target Environment'
type: choice
options: [development, staging, production]
default: staging

env:
NODE_VERSION: '20'
REGISTRY: ghcr.io

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
# ── 阶段 1:代码质量 ──
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
- run: npm ci
- run: npm run lint
- run: npm run type-check

# ── 阶段 2:测试(矩阵) ──
test:
needs: lint
runs-on: ubuntu-latest
strategy:
matrix:
node-version: [18, 20, 22]
fail-fast: false
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node-version }}
- name: Cache dependencies
uses: actions/cache@v4
with:
path: ~/.npm
key: ${{ runner.os }}-node-${{ matrix.node-version }}-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.os }}-node-${{ matrix.node-version }}-
- run: npm ci
- run: npm test -- --coverage
- name: Upload coverage
uses: actions/upload-artifact@v4
with:
name: coverage-${{ matrix.node-version }}
path: coverage/

# ── 阶段 3:构建与推送镜像 ──
build-and-push:
needs: test
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
outputs:
image-tag: ${{ steps.meta.outputs.tags }}
steps:
- uses: actions/checkout@v4
- name: Log in to Container Registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ github.repository }}
tags: |
type=semver,pattern={{version}}
type=sha,format=short
type=raw,value=latest,enable={{is_default_branch}}
- name: Build and push
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max

# ── 阶段 4:部署到 Staging ──
deploy-staging:
needs: build-and-push
if: github.ref == 'refs/heads/main'
runs-on: [self-hosted, staging]
environment:
name: staging
url: https://staging.your-app.com
steps:
- uses: actions/checkout@v4
- name: Deploy to Staging
run: |
docker compose -f docker-compose.staging.yml pull
docker compose -f docker-compose.staging.yml up -d
- name: Smoke Test
run: |
sleep 10
curl -f http://localhost:3000/health

# ── 阶段 5:部署到 Production(需审批) ──
deploy-production:
needs: deploy-staging
if: github.ref == 'refs/heads/main'
runs-on: [self-hosted, prod]
environment:
name: production
url: https://your-app.com
steps:
- uses: actions/checkout@v4
- name: Deploy to Production
run: |
docker compose pull
docker compose up -d
- name: Health Check
run: |
for i in $(seq 1 30); do
if curl -sf https://your-app.com/health > /dev/null; then
echo "Health check passed"
exit 0
fi
sleep 5
done
echo "Health check failed"
exit 1
- name: Rollback on Failure
if: failure()
run: |
docker compose down
docker compose -f docker-compose.rollback.yml up -d

8. 常见问题

Q1:自托管 Runner 显示离线怎么办?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# 检查服务状态
sudo ./svc.sh status

# 查看日志
sudo journalctl -u actions.runner.*.service -n 50 --no-pager

# 重启服务
sudo ./svc.sh restart

# 如果持续离线,重新配置
sudo ./svc.sh stop
./config.sh remove --token 新Token
./config.sh --url https://github.com/你的用户名/你的仓库 --token 新Token
sudo ./svc.sh start

Q2:工作流一直处于排队状态?

  • 检查 Runner 标签是否匹配:runs-on: [self-hosted, prod] 必须与配置时的标签一致
  • 检查 Runner 是否空闲:Settings → Actions → Runners
  • 检查并发限制:免费账户限制 20 个并发 job
  • 检查组织级 Runner 配额

Q3:如何限制自托管 Runner 只能运行特定工作流?

1
2
# 在仓库 Settings → Actions → Runner groups 中配置
# 或使用仓库级别的 Actions 权限控制

Q4:密钥安全最佳实践?

  • 使用 ${{ secrets.xxx }} 而非明文
  • 环境级别密钥比仓库级别密钥更安全
  • 使用 OpenID Connect(OIDC)替代长期密钥
  • 定期轮换密钥

Q5:工作流运行失败但不知道原因?

1
2
3
4
# 启用调试日志
env:
ACTIONS_STEP_DEBUG: true
ACTIONS_RUNNER_DEBUG: true

Q6:如何加速构建时间?

  1. 使用自托管 Runner(消除排队)
  2. 启用依赖缓存
  3. 使用 Docker layer caching
  4. 并行化 job(矩阵构建)
  5. 仅在有代码变更时运行(路径过滤)
1
2
3
4
5
6
on:
push:
paths:
- 'src/**'
- 'package.json'
- 'Dockerfile'

Q7:如何处理部署失败后的自动回滚?

1
2
3
4
5
6
7
8
9
10
11
- name: Deploy
id: deploy
run: ./deploy.sh

- name: Health Check
id: health
run: curl -f https://your-app.com/health

- name: Rollback
if: steps.health.outcome == 'failure'
run: ./rollback.sh

总结

本文涵盖了 GitHub Actions 从基础到高级的核心实践:

特性 适用场景 关键命令/配置
自托管 Runner 内网部署、高性能构建 ./config.shsudo ./svc.sh install
矩阵构建 多版本/多平台测试 strategy.matrix
可复用工作流 跨仓库共享流水线 workflow_call + uses:
环境门控 生产部署审批 environment: + Required reviewers
高级缓存 加速构建 actions/cache@v4
Workflow Dispatch 手动触发部署 workflow_dispatch + inputs

下一步可以探索 GitHub Actions 的 OIDC 集成(无需密钥即可访问云服务)、Composite Actions(自定义可复用步骤)、以及 GitHub Actions 与 Kubernetes 的集成部署。