[Kubernetes] 使用kubespray安装k8s集群

Posted by 胡伟煌 on 2018-06-23

1. 环境准备

1.1. 部署机器

以下机器为虚拟机

机器IP 主机名 角色 系统版本 备注
172.16.94.140 kube-master-0 k8s master Centos 4.17.14 内存:3G
172.16.94.141 kube-node-41 k8s node Centos 4.17.14 内存:3G
172.16.94.142 kube-node-42 k8s node Centos 4.17.14 内存:3G
172.16.94.135 部署管理机

1.2. 配置管理机

管理机主要用来部署k8s集群,需要安装以下版本的软件,具体可参考:

1
2
3
4
5
6
ansible>=2.4.0
jinja2>=2.9.6
netaddr
pbr>=1.6
ansible-modules-hashivault>=3.9.4
hvac

1、安装及配置ansible

2、安装python-netaddr

1
2
3
4
5
# 安装pip
yum -y install epel-release
yum -y install python-pip
# 安装python-netaddr
pip install netaddr

3、升级Jinja

1
2
# Jinja 2.9 (or newer)
pip install --upgrade jinja2

1.3. 配置部署机器

部署机器即用来运行k8s集群的机器,包括MasterNode

1、确认系统版本

本文采用centos7的系统,建议将系统内核升级到4.x.x以上。

2、关闭防火墙

1
2
3
systemctl stop firewalld
systemctl disable firewalld
iptables -F

3、关闭swap

Kubespary v2.5.0的版本需要关闭swap,具体参考

1
2
3
4
5
- name: Stop if swap enabled
assert:
that: ansible_swaptotal_mb == 0
when: kubelet_fail_swap_on|default(true)
ignore_errors: "{{ ignore_assert_errors }}"

V2.6.0 版本去除了swap的检查,具体参考:

执行关闭swap命令swapoff -a

1
2
3
4
5
6
7
8
[root@master ~]#swapoff -a
[root@master ~]#
[root@master ~]# free -m
total used free shared buff/cache available
Mem: 976 366 135 6 474 393
Swap: 0 0 0

# swap 一栏为0,表示已经关闭了swap

4、确认部署机器内存

由于本文采用虚拟机部署,内存可能存在不足的问题,因此将虚拟机内存调整为3G或以上;如果是物理机一般不会有内存不足的问题。具体参考:

1
2
3
4
5
6
7
8
9
10
11
- name: Stop if memory is too small for masters
assert:
that: ansible_memtotal_mb >= 1500
ignore_errors: "{{ ignore_assert_errors }}"
when: inventory_hostname in groups['kube-master']

- name: Stop if memory is too small for nodes
assert:
that: ansible_memtotal_mb >= 1024
ignore_errors: "{{ ignore_assert_errors }}"
when: inventory_hostname in groups['kube-node']

1.4. 涉及镜像

Docker版本为17.03.2-ce

1、Master节点

镜像 版本 大小 镜像ID 备注
gcr.io/google-containers/hyperkube v1.9.5 620 MB a7e7fdbc5fee k8s
quay.io/coreos/etcd v3.2.4 35.7 MB 498ffffcfd05
gcr.io/google_containers/pause-amd64 3.0 747 kB 99e59f495ffa
quay.io/calico/node v2.6.8 282 MB e96a297310fd calico
quay.io/calico/cni v1.11.4 70.8 MB 4c4cb67d7a88 calico
quay.io/calico/ctl v1.6.3 44.4 MB 46d3aace8bc6 calico

2、Node节点

镜像 版本 大小 镜像ID 备注
gcr.io/google-containers/hyperkube v1.9.5 620 MB a7e7fdbc5fee k8s
gcr.io/google_containers/pause-amd64 3.0 747 kB 99e59f495ffa
quay.io/calico/node v2.6.8 282 MB e96a297310fd calico
quay.io/calico/cni v1.11.4 70.8 MB 4c4cb67d7a88 calico
quay.io/calico/ctl v1.6.3 44.4 MB 46d3aace8bc6 calico
gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64 1.14.8 40.9 MB c2ce1ffb51ed dns
gcr.io/google_containers/k8s-dns-sidecar-amd64 1.14.8 42.2 MB 6f7f2dc7fab5 dns
gcr.io/google_containers/k8s-dns-kube-dns-amd64 1.14.8 50.5 MB 80cc5ea4b547 dns
gcr.io/google_containers/cluster-proportional-autoscaler-amd64 1.1.2 50.5 MB 78cf3f492e6b
gcr.io/google_containers/kubernetes-dashboard-amd64 v1.8.3 102 MB 0c60bcf89900 dashboard
nginx 1.13 109 MB ae513a47849c

3、说明

  • 镜像被墙并且全部镜像下载需要较多时间,建议提前下载到部署机器上。
  • hyperkube镜像主要用来运行k8s核心组件(例如kube-apiserver等)。
  • 此处使用的网络组件为calico。

2. 部署集群

2.1. 下载kubespary的源码

1
git clone https://github.com/kubernetes-incubator/kubespray.git

2.2. 编辑配置文件

2.2.1. hosts.ini

hosts.ini主要为部署节点机器信息的文件,路径为:kubespray/inventory/sample/hosts.ini

1
2
3
4
cd kubespray
# 复制一份配置进行修改
cp -rfp inventory/sample inventory/k8s
vi inventory/k8s/hosts.ini

例如:

hosts.ini文件可以填写部署机器的登录密码,也可以不填密码而设置ssh的免密登录。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
## Configure 'ip' variable to bind kubernetes services on a
## different ip than the default iface
## 主机名 ssh登陆IP ssh用户名 ssh登陆密码 机器IP 子网掩码
kube-master-0 ansible_ssh_host=172.16.94.140 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.140 mask=/24
kube-node-41 ansible_ssh_host=172.16.94.141 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.141 mask=/24
kube-node-42 ansible_ssh_host=172.16.94.142 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.142 mask=/24

## configure a bastion host if your nodes are not directly reachable
# bastion ansible_ssh_host=x.x.x.x

[kube-master]
kube-master-0

[etcd]
kube-master-0

[kube-node]
kube-node-41
kube-node-42

[k8s-cluster:children]
kube-node
kube-master

[calico-rr]

2.2.2. k8s-cluster.yml

k8s-cluster.yml主要为k8s集群的配置文件,路径为:kubespray/inventory/k8s/group_vars/k8s-cluster.yml。该文件可以修改安装的k8s集群的版本,参数为:kube_version: v1.9.5。具体可参考:

2.3. 执行部署操作

1
2
3
4
# 进入主目录
cd kubespray
# 执行部署命令
ansible-playbook -i inventory/k8s/hosts.ini cluster.yml -b -vvv

-vvv 参数表示输出运行日志

如果需要重置可以执行以下命令:

1
ansible-playbook -i inventory/k8s/hosts.ini reset.yml -b -vvv

3. 确认部署结果

3.1. ansible的部署结果

ansible命令执行完,出现以下日志,则说明部署成功,否则根据报错内容进行修改。

1
2
3
4
5
PLAY RECAP *****************************************************************************
kube-master-0 : ok=309 changed=30 unreachable=0 failed=0
kube-node-41 : ok=203 changed=8 unreachable=0 failed=0
kube-node-42 : ok=203 changed=8 unreachable=0 failed=0
localhost : ok=2 changed=0 unreachable=0 failed=0

以下为部分部署执行日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
kubernetes/preinstall : Update package management cache (YUM) --------------------23.96s
/root/gopath/src/kubespray/roles/kubernetes/preinstall/tasks/main.yml:121
kubernetes/master : Master | wait for the apiserver to be running ----------------23.44s
/root/gopath/src/kubespray/roles/kubernetes/master/handlers/main.yml:79
kubernetes/preinstall : Install packages requirements ----------------------------20.20s
/root/gopath/src/kubespray/roles/kubernetes/preinstall/tasks/main.yml:203
kubernetes/secrets : Check certs | check if a cert already exists on node --------13.94s
/root/gopath/src/kubespray/roles/kubernetes/secrets/tasks/check-certs.yml:17
gather facts from all instances --------------------------------------------------9.98s
/root/gopath/src/kubespray/cluster.yml:25
kubernetes/node : install | Compare host kubelet with hyperkube container --------9.66s
/root/gopath/src/kubespray/roles/kubernetes/node/tasks/install_host.yml:2
kubernetes-apps/ansible : Kubernetes Apps | Start Resources -----------------------9.27s
/root/gopath/src/kubespray/roles/kubernetes-apps/ansible/tasks/main.yml:37
kubernetes-apps/ansible : Kubernetes Apps | Lay Down KubeDNS Template ------------8.47s
/root/gopath/src/kubespray/roles/kubernetes-apps/ansible/tasks/kubedns.yml:3
download : Sync container ---------------------------------------------------------8.23s
/root/gopath/src/kubespray/roles/download/tasks/main.yml:15
kubernetes-apps/network_plugin/calico : Start Calico resources --------------------7.82s
/root/gopath/src/kubespray/roles/kubernetes-apps/network_plugin/calico/tasks/main.yml:2
download : Download items ---------------------------------------------------------7.67s
/root/gopath/src/kubespray/roles/download/tasks/main.yml:6
download : Download items ---------------------------------------------------------7.48s
/root/gopath/src/kubespray/roles/download/tasks/main.yml:6
download : Sync container ---------------------------------------------------------7.35s
/root/gopath/src/kubespray/roles/download/tasks/main.yml:15
download : Download items ---------------------------------------------------------7.16s
/root/gopath/src/kubespray/roles/download/tasks/main.yml:6
network_plugin/calico : Calico | Copy cni plugins from calico/cni container -------7.10s
/root/gopath/src/kubespray/roles/network_plugin/calico/tasks/main.yml:62
download : Download items ---------------------------------------------------------7.04s
/root/gopath/src/kubespray/roles/download/tasks/main.yml:6
download : Download items ---------------------------------------------------------7.01s
/root/gopath/src/kubespray/roles/download/tasks/main.yml:6
download : Sync container ---------------------------------------------------------7.00s
/root/gopath/src/kubespray/roles/download/tasks/main.yml:15
download : Download items ---------------------------------------------------------6.98s
/root/gopath/src/kubespray/roles/download/tasks/main.yml:6
download : Download items ---------------------------------------------------------6.79s
/root/gopath/src/kubespray/roles/download/tasks/main.yml:6

3.2. k8s集群运行结果

1、k8s组件信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# kubectl get all --namespace=kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
ds/calico-node 3 3 3 3 3 <none> 2h

NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/kube-dns 2 2 2 2 2h
deploy/kubedns-autoscaler 1 1 1 1 2h
deploy/kubernetes-dashboard 1 1 1 1 2h

NAME DESIRED CURRENT READY AGE
rs/kube-dns-79d99cdcd5 2 2 2 2h
rs/kubedns-autoscaler-5564b5585f 1 1 1 2h
rs/kubernetes-dashboard-69cb58d748 1 1 1 2h

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
ds/calico-node 3 3 3 3 3 <none> 2h

NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/kube-dns 2 2 2 2 2h
deploy/kubedns-autoscaler 1 1 1 1 2h
deploy/kubernetes-dashboard 1 1 1 1 2h

NAME DESIRED CURRENT READY AGE
rs/kube-dns-79d99cdcd5 2 2 2 2h
rs/kubedns-autoscaler-5564b5585f 1 1 1 2h
rs/kubernetes-dashboard-69cb58d748 1 1 1 2h

NAME READY STATUS RESTARTS AGE
po/calico-node-22vsg 1/1 Running 0 2h
po/calico-node-t7zgw 1/1 Running 0 2h
po/calico-node-zqnx8 1/1 Running 0 2h
po/kube-apiserver-kube-master-0 1/1 Running 0 22h
po/kube-controller-manager-kube-master-0 1/1 Running 0 2h
po/kube-dns-79d99cdcd5-f2t6t 3/3 Running 0 2h
po/kube-dns-79d99cdcd5-gw944 3/3 Running 0 2h
po/kube-proxy-kube-master-0 1/1 Running 2 22h
po/kube-proxy-kube-node-41 1/1 Running 3 22h
po/kube-proxy-kube-node-42 1/1 Running 3 22h
po/kube-scheduler-kube-master-0 1/1 Running 0 2h
po/kubedns-autoscaler-5564b5585f-lt9bb 1/1 Running 0 2h
po/kubernetes-dashboard-69cb58d748-wmb9x 1/1 Running 0 2h
po/nginx-proxy-kube-node-41 1/1 Running 3 22h
po/nginx-proxy-kube-node-42 1/1 Running 3 22h

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc/kube-dns ClusterIP 10.233.0.3 <none> 53/UDP,53/TCP 2h
svc/kubernetes-dashboard ClusterIP 10.233.27.24 <none> 443/TCP 2h

2、k8s节点信息

1
2
3
4
5
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kube-master-0 Ready master 22h v1.9.5
kube-node-41 Ready node 22h v1.9.5
kube-node-42 Ready node 22h v1.9.5

3、组件健康信息

1
2
3
4
5
# kubectl get cs
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health": "true"}

4. troubles shooting

在使用kubespary部署k8s集群时,主要遇到以下报错。

4.1. python-netaddr未安装

  • 报错内容:
1
fatal: [node1]: FAILED! => {"failed": true, "msg": "The ipaddr filter requires python-netaddr be installed on the ansible controller"}
  • 解决方法:

需要安装 python-netaddr,具体参考上述[环境准备]内容。

4.2. swap未关闭

  • 报错内容:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
fatal: [kube-master-0]: FAILED! => {
"assertion": "ansible_swaptotal_mb == 0",
"changed": false,
"evaluated_to": false
}
fatal: [kube-node-41]: FAILED! => {
"assertion": "ansible_swaptotal_mb == 0",
"changed": false,
"evaluated_to": false
}
fatal: [kube-node-42]: FAILED! => {
"assertion": "ansible_swaptotal_mb == 0",
"changed": false,
"evaluated_to": false
}
  • 解决方法:

所有部署机器执行swapoff -a关闭swap,具体参考上述[环境准备]内容。

4.3. 部署机器内存过小

  • 报错内容:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
TASK [kubernetes/preinstall : Stop if memory is too small for masters] *********************************************************************************************************************************************************************************************************
task path: /root/gopath/src/kubespray/roles/kubernetes/preinstall/tasks/verify-settings.yml:52
Friday 10 August 2018 21:50:26 +0800 (0:00:00.940) 0:01:14.088 *********
fatal: [kube-master-0]: FAILED! => {
"assertion": "ansible_memtotal_mb >= 1500",
"changed": false,
"evaluated_to": false
}

TASK [kubernetes/preinstall : Stop if memory is too small for nodes] ***********************************************************************************************************************************************************************************************************
task path: /root/gopath/src/kubespray/roles/kubernetes/preinstall/tasks/verify-settings.yml:58
Friday 10 August 2018 21:50:27 +0800 (0:00:00.570) 0:01:14.659 *********
fatal: [kube-node-41]: FAILED! => {
"assertion": "ansible_memtotal_mb >= 1024",
"changed": false,
"evaluated_to": false
}
fatal: [kube-node-42]: FAILED! => {
"assertion": "ansible_memtotal_mb >= 1024",
"changed": false,
"evaluated_to": false
}
to retry, use: --limit @/root/gopath/src/kubespray/cluster.retry
  • 解决方法:

调大所有部署机器的内存,本示例中调整为3G或以上。

4.4. kube-scheduler组件运行失败

kube-scheduler组件运行失败,导致http://localhost:10251/healthz调用失败。

  • 报错内容:
1
2
3
FAILED - RETRYING: Master | wait for kube-scheduler (1 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (1 retries left).
fatal: [node1]: FAILED! => {"attempts": 60, "changed": false, "content": "", "failed": true, "msg": "Status code was not [200]: Request failed: <urlopen error [Errno 111] Connection refused>", "redirected": false, "status": -1, "url": "http://localhost:10251/healthz"}
  • 解决方法:

可能是内存不足导致,本示例中调大了部署机器的内存。

4.5. docker安装包冲突

  • 报错内容:
1
2
3
4
5
6
7
8
9
10
11
12
13
failed: [k8s-node-1] (item={u'name': u'docker-engine-1.13.1-1.el7.centos'}) => {
"attempts": 4,
"changed": false,
...
"item": {
"name": "docker-engine-1.13.1-1.el7.centos"
},
"msg": "Error: docker-ce-selinux conflicts with 2:container-selinux-2.66-1.el7.noarch\n",
"rc": 1,
"results": [
"Loaded plugins: fastestmirror\nLoading mirror speeds from cached hostfile\n * elrepo: mirrors.tuna.tsinghua.edu.cn\n * epel: mirrors.tongji.edu.cn\nPackage docker-engine is obsoleted by docker-ce, trying to install docker-ce-17.03.2.ce-1.el7.centos.x86_64 instead\nResolving Dependencies\n--> Running transaction check\n---> Package docker-ce.x86_64 0:17.03.2.ce-1.el7.centos will be installed\n--> Processing Dependency: docker-ce-selinux >= 17.03.2.ce-1.el7.centos for package: docker-ce-17.03.2.ce-1.el7.centos.x86_64\n--> Processing Dependency: libltdl.so.7()(64bit) for package: docker-ce-17.03.2.ce-1.el7.centos.x86_64\n--> Running transaction check\n---> Package docker-ce-selinux.noarch 0:17.03.2.ce-1.el7.centos will be installed\n---> Package libtool-ltdl.x86_64 0:2.4.2-22.el7_3 will be installed\n--> Processing Conflict: docker-ce-selinux-17.03.2.ce-1.el7.centos.noarch conflicts docker-selinux\n--> Restarting Dependency Resolution with new changes.\n--> Running transaction check\n---> Package container-selinux.noarch 2:2.55-1.el7 will be updated\n---> Package container-selinux.noarch 2:2.66-1.el7 will be an update\n--> Processing Conflict: docker-ce-selinux-17.03.2.ce-1.el7.centos.noarch conflicts docker-selinux\n--> Finished Dependency Resolution\n You could try using --skip-broken to work around the problem\n You could try running: rpm -Va --nofiles --nodigest\n"
]
}
  • 解决方法:

卸载旧的docker版本,由kubespary自动安装。

1
2
3
4
5
6
7
8
9
10
sudo yum remove -y docker \
docker-client \
docker-client-latest \
docker-common \
docker-latest \
docker-latest-logrotate \
docker-logrotate \
docker-selinux \
docker-engine-selinux \
docker-engine

参考文章:

推荐文章