Kruise Rollout发布

Posted by 胡伟煌 on 2024-11-23

1. kruise-rollout简介

金丝雀发布是逐步将流量导向新版本的应用,以最小化风险。具体过程包括:

  • 部署新版本的 Pod。
  • 将一部分流量分配到新版本(通常比例很小)。
  • 逐步增加新版本的流量比例,直到完成全量发布。

Kruise Rollouts 是一个 Bypass(旁路) 组件,提供 高级渐进式交付功能 。可以通过Kruise Rollouts插件来实现金丝雀发布的能力。

组件 Kruise Rollouts
核心概念 增强现有的工作负载
架构 Bypass
插拔和热切换
发布类型 多批次、金丝雀、A/B测试、全链路灰度
工作负载类型 Deployment、StatefulSet、CloneSet、Advanced StatefulSet、Advanced DaemonSet
流量类型 Ingress、GatewayAPI、CRD(需要 Lua 脚本)
迁移成本 无需迁移工作负载和Pods
HPA 兼容性

2. 安装kruise-rollout

2.1. helm安装kruise-rollout

1
2
3
4
5
6
7
8
9
helm repo add openkruise https://openkruise.github.io/charts/

helm repo update

kubectl create ns openkruise
helm install kruise-rollout openkruise/kruise-rollout -n openkruise

# 升级到指定版本
helm upgrade kruise-rollout openkruise/kruise-rollout --version 0.5.0

查看部署结果

通过helmkubectl的命令可以看到创建了几个crdkruise-rollout-controller-manager的pod来提供rollout的功能。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# helm list -A
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
kruise-rollout openkruise 1 2024-11-24 16:52:36.067327844 +0800 +08 deployed kruise-rollout-0.5.0 0.5.0

# kubectl get po -n kruise-rollout
NAME READY STATUS RESTARTS AGE
kruise-rollout-controller-manager-875654888-rpxds 1/1 Running 0 87s
kruise-rollout-controller-manager-875654888-w75xj 1/1 Running 0 87s

# kubectl get crd |grep kruise
batchreleases.rollouts.kruise.io 2024-11-24T08:52:36Z
rollouthistories.rollouts.kruise.io 2024-11-24T08:52:36Z
rollouts.rollouts.kruise.io 2024-11-24T08:52:36Z
trafficroutings.rollouts.kruise.io 2024-11-24T08:52:36Z

2.2. 安装kubectl-kruise

kubectl-kruise用于执行rollout发版和回退等操作的二进制命令。

  • kubectl-kruise rollout approve:执行下一批版本发布
  • kubectl-kruise rollout undo:回退全部发版批次
1
2
3
wget https://github.com/openkruise/kruise-tools/releases/download/v1.1.7/kubectl-kruise-linux-amd64-v1.1.7.tar.gz
tar -zvxf kubectl-kruise-linux-amd64-v1.1.7.tar.gz
mv linux-amd64/kubectl-kruise /usr/local/bin/

3. 使用指南

先部署一个k8s deployment对象,例如部署10个nginx:1.26的容器。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: nginx
name: nginx
namespace: default
spec:
replicas: 10
selector:
matchLabels:
app: nginx
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: nginx:1.26
imagePullPolicy: IfNotPresent
name: nginx
ports:
- containerPort: 80
protocol: TCP

3.1. 创建发布策略(rollout对象)

我们以使用多批次灰度发布为例。

  1. workloadRef定义作用于哪个deployment对象。
  2. strategy定义发布策略。

以下是发布策略示例描述:

  • 在第一批次:只升级 1个Pod
  • 在第二批次:升级 50% 的 Pods,即 5个已更新的Pod
  • 在第三批次:升级 100% 的 Pods,即 10个已更新的Pod
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ kubectl apply -f - <<EOF
apiVersion: rollouts.kruise.io/v1beta1
kind: Rollout
metadata:
name: rollouts-nginx
namespace: default
spec:
workloadRef:
apiVersion: apps/v1
kind: Deployment
name: nginx
strategy:
canary:
enableExtraWorkloadForCanary: false
steps:
- replicas: 1
- replicas: 50%
- replicas: 100%
EOF

3.2. 升级发布版本(发布第一批次)

升级nginx:1.26到nginx:1.27

1
kubectl set image deployment nginx nginx=nginx:1.27

查看灰度结果:

1
2
3
4
# kubectl get rs -L pod-template-hash -w
NAME DESIRED CURRENT READY AGE POD-TEMPLATE-HASH
nginx-68556bc579 1 1 1 8s 68556bc579 # nginx:1.27
nginx-d7f5d89c9 9 9 9 24m d7f5d89c9 # nginx:1.26

查看rollout状态:

rollout对象的状态主要描述了当前处于哪个rollout阶段及总状态和子阶段状态的信息。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# kubectl get rollout rollouts-nginx -oyaml
apiVersion: rollouts.kruise.io/v1beta1
kind: Rollout
metadata:
name: rollouts-nginx
namespace: default
spec:
disabled: false
strategy:
canary:
steps:
- pause: {}
replicas: 1
- pause: {}
replicas: 50%
- pause: {}
replicas: 100%
workloadRef:
apiVersion: apps/v1
kind: Deployment
name: nginx
status:
canaryStatus:
canaryReadyReplicas: 1
canaryReplicas: 1
canaryRevision: 5c9556986d
currentStepIndex: 1
currentStepState: StepPaused # 当前步骤状态
lastUpdateTime: "2024-11-24T11:52:56Z"
message: BatchRelease is at state Ready, rollout-id , step 1
observedWorkloadGeneration: 36
podTemplateHash: d7f5d89c9
rolloutHash: 77cxd69w47b7bwddwv2w7vxvb4xxdbwcx9x289vw69w788w4w6z4x8dd4vbz2zbw
stableRevision: 68556bc579
conditions:
- lastTransitionTime: "2024-11-24T11:52:47Z"
lastUpdateTime: "2024-11-24T11:52:47Z"
message: Rollout is in Progressing
reason: InRolling
status: "True"
type: Progressing
message: Rollout is in step(1/3), and you need manually confirm to enter the next
step # rollout信息
observedGeneration: 2
phase: Progressing # 整体状态

3.3. 发布第二批次

1
kubectl-kruise rollout approve rollout/rollouts-nginx -n default

查看灰度结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# kubectl get rs -L pod-template-hash -w
NAME DESIRED CURRENT READY AGE POD-TEMPLATE-HASH
nginx-68556bc579 5 5 5 7m24s 68556bc579 # nginx:1.27
nginx-d7f5d89c9 5 5 5 32m d7f5d89c9 # nginx:1.26

# kubectl get po
NAME READY STATUS RESTARTS AGE
nginx-68556bc579-24lpl 1/1 Running 0 7m34s # nginx:1.27 第一批次
nginx-68556bc579-2dph8 1/1 Running 0 14s # nginx:1.27 第二批次
nginx-68556bc579-57pqt 1/1 Running 0 14s # nginx:1.27 第二批次
nginx-68556bc579-879s9 1/1 Running 0 14s # nginx:1.27 第二批次
nginx-68556bc579-fwt52 1/1 Running 0 14s # nginx:1.27 第二批次
nginx-d7f5d89c9-5fbfp 1/1 Running 0 30m # 其余为第三批次
nginx-d7f5d89c9-gkz9p 1/1 Running 0 30m
nginx-d7f5d89c9-jhxwl 1/1 Running 0 30m
nginx-d7f5d89c9-vrqfz 1/1 Running 0 30m
nginx-d7f5d89c9-zk2sj 1/1 Running 0 30m

3.4. 发布第三批次

1
kubectl-kruise rollout approve rollout/rollouts-nginx -n default

查看结果:

1
2
3
4
# kubectl get rs -L pod-template-hash -w
NAME DESIRED CURRENT READY AGE POD-TEMPLATE-HASH
nginx-68556bc579 10 10 10 12m 68556bc579 # nginx:1.27
nginx-d7f5d89c9 0 0 0 37m d7f5d89c9

3.5. 发布回滚

该回滚会把全部批次的pod都回滚到第一批发版前的版本。

1
kubectl-kruise rollout undo rollout/rollouts-nginx -n default

4. k8s对象变更

kruise-rollout的原理主要还是劫持DeploymentReplicaSet的操作,例如在升级deployment的时候会把deployment原先的strategy策略放在annotations中暂存,再增加paused=true的参数暂停deployment的发布。在rollout全部结束后再恢复原先的deployment参数,主要包括pausedstrategy

以下是rollout中间阶段的deployment信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
batchrelease.rollouts.kruise.io/control-info: '{"apiVersion":"rollouts.kruise.io/v1beta1","kind":"BatchRelease","name":"rollouts-nginx","uid":"8f29624b-ab77-49d1-abbe-aa92bb3998e2","controller":true,"blockOwnerDeletion":true}'
deployment.kubernetes.io/revision: "5"
rollouts.kruise.io/deployment-extra-status: '{"updatedReadyReplicas":1,"expectedUpdatedReplicas":1}'
rollouts.kruise.io/deployment-strategy: '{"rollingStyle":"Partition","rollingUpdate":{"maxUnavailable":"25%","maxSurge":"25%"},"partition":1}' # deployment原本的strategy策略
rollouts.kruise.io/in-progressing: '{"rolloutName":"rollouts-nginx"}'
generation: 36
labels:
app: nginx
rollouts.kruise.io/controlled-by-advanced-deployment-controller: "true"
rollouts.kruise.io/stable-revision: 68556bc579
rollouts.kruise.io/workload-type: deployment
name: nginx
namespace: default
spec:
paused: true # 增加了paused参数
progressDeadlineSeconds: 600
replicas: 10
revisionHistoryLimit: 10

5. Rollout流量灰度

5.1. 添加pod 版本标签

为了适配ApisixNginx等k8s流量网关的灰度逻辑,我们通过k8s service的方式来动态获取Pod的IP(即endpoint),其中包括全量pod,灰度pod和非灰度pod。因此我们在每次发版的deployment给pod加上所属的版本标签

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: nginx
name: nginx
namespace: default
spec:
replicas: 10
selector:
matchLabels:
app: nginx
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
app: nginx
version: "1.26" # 增加pod对应的版本标签,用于service联动
spec:
containers:
- image: nginx:1.26
imagePullPolicy: IfNotPresent
name: nginx
ports:
- containerPort: 80
protocol: TCP

5.2. 添加灰度pod的service

创建三个service来对应全量pod,灰度pod和非灰度pod。

全量pod的service

1
2
3
4
5
6
7
8
9
10
11
12
13
apiVersion: v1
kind: Service
metadata:
name: nginx
namespace: default
spec:
selector:
app: nginx # 全量pod的标签
ports:
- name: http
protocol: TCP
port: 80
targetPort: 80

灰度pod的service

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
apiVersion: v1
kind: Service
metadata:
name: nginx-canary
namespace: default
labels:
version: canary
spec:
selector:
app: nginx
version: "1.27" # 增加灰度版本的标签,对应灰度的pod
ports:
- name: http
protocol: TCP
port: 80
targetPort: 80

非灰度pod的service

1
2
3
4
5
6
7
8
9
10
11
12
13
14
apiVersion: v1
kind: Service
metadata:
name: nginx-stable
namespace: default
spec:
selector:
app: nginx
version: "1.26" # 增加非灰度版本的标签,对应非灰度的pod
ports:
- name: http
protocol: TCP
port: 80
targetPort: 80

5.3. 动态获取灰度的endpoint

通过以上service的配置,可以在灰度的过程中动态获取灰度pod的endpoint,然后动态集成到流量网关中。

  • 全量pod的service:在流量灰度前和流量灰度完成后可以把流量调度到这个service下的pod。
  • 灰度pod的service:可以设置把灰度流量调度到canary的service的pod。
  • 非灰度pod的service:可以把非灰度流量调度到stable的service的pod。

以下是灰度前后endpoint的变化:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# kubectl get endpoints
# 灰度前:canary的pod为0
nginx 10.0.1.139:80,10.0.1.202:80,10.0.1.203:80 + 7 more... 2d
nginx-canary <none> 2d
nginx-stable 10.0.1.139:80,10.0.1.202:80,10.0.1.203:80 + 7 more... 47h

# 灰度中:canary的pod增加
nginx 10.0.1.202:80,10.0.1.203:80,10.0.1.204:80 + 7 more... 2d <none>
nginx-canary 10.0.1.5:80 2d version=canary
nginx-stable 10.0.1.202:80,10.0.1.203:80,10.0.1.204:80 + 6 more... 47h version=stable

# 灰度完成:stable的pod为0
NAME ENDPOINTS AGE
nginx 10.0.1.139:80,10.0.1.202:80,10.0.1.203:80 + 7 more... 2d
nginx-canary 10.0.1.139:80,10.0.1.202:80,10.0.1.203:80 + 7 more... 2d
nginx-stable <none> 47h

6. 总结

k8s的deployment默认支持金丝雀发布,可以通过kubectl rollout的命令实现,通过配置deployment中的maxSurgemaxUnavailable来控制发版的节奏,可以通过以下命令来控制发版批次:

  • kubectl rollout pause:暂停发版
  • kubectl rollout resume:继续发版
  • kubectl rollout undo:回滚版本

但相对来讲,控制粒度没有那么精确,而OpenKruiseRollouts插件提供了一种更加轻便,可控的发版工具,个人认为最大的优势是配置简单,且无需迁移工作负载和Pods,可以更好的集成到现有的容器平台中。因此可以通过该工具来实现金丝雀发版(多批次发版),实现灰度的能力。

参考:



支付宝打赏 微信打赏

赞赏一下