模板名稱
ACS-CS-DedicatedMigration ACK專屬版master休眠&etcd備份上傳
模板描述
ACK專屬版master休眠&etcd備份上傳
模板類型
自動化
所有者
Alibaba Cloud
輸入參數
參數名稱 | 描述 | 類型 | 是否必填 | 預設值 | 約束 |
targets | 目標執行個體 | Json | 是 | ||
BucketName | 需要上傳snapshot的oss路徑 | String | 是 | ||
OSSEndpoint | 需要上傳snapshot的oss對應的endpoint | String | 是 | ||
ClusterID | 叢集的ID | String | 是 | ||
regionId | 地區ID | String | 否 | {{ ACS::RegionId }} | |
workingDir | ECS執行個體中運行命令的目錄 | String | 否 | /root | |
rateControl | 任務執行的並發比率 | Json | 否 | {'Mode': 'Concurrency', 'MaxErrors': 0, 'Concurrency': 5} | |
action | 配置方式 | String | 否 | rollback | |
OOSAssumeRole | OOS扮演的RAM角色 | String | 否 | "" |
輸出參數
參數名稱 | 描述 | 類型 |
sleepOrWakeupControlPlaneOutputs | List | |
etcdCheckoutOutputs | List | |
findLeaderOutputs | List | |
readSignOutputs | List |
執行此模板需要的權限原則
{
"Version": "1",
"Statement": [
{
"Action": [
"ecs:DescribeInstances",
"ecs:DescribeInvocationResults",
"ecs:DescribeInvocations",
"ecs:RunCommand"
],
"Resource": "*",
"Effect": "Allow"
}
]
}
詳情
模板內容
FormatVersion: OOS-2019-06-01
Description:
en: Sleep control plane, make etcd snapshot and upload it to oss bucket
zh-cn: ACK專屬版master休眠&etcd備份上傳
name-en: ACS-CS-DedicatedMigration
name-zh-cn: ACK專屬版master休眠&etcd備份上傳
categories:
- others
Parameters:
regionId:
Type: String
Label:
en: RegionId
zh-cn: 地區ID
AssociationProperty: RegionId
Default: '{{ ACS::RegionId }}'
workingDir:
Label:
en: WorkingDir
zh-cn: ECS執行個體中運行命令的目錄
Type: String
Default: /root
rateControl:
Label:
en: RateControl
zh-cn: 任務執行的並發比率
Type: Json
AssociationProperty: RateControl
Default:
Mode: Concurrency
MaxErrors: 0
Concurrency: 5
targets:
Label:
en: TargetInstance
zh-cn: 目標執行個體
Type: Json
AssociationProperty: Targets
AssociationPropertyMetadata:
ResourceType: 'ALIYUN::ECS::Instance'
RegionId: regionId
action:
Type: String
Label:
en: Action
zh-cn: 配置方式
Default: rollback
AllowedValues:
- migrate
- rollback
OOSAssumeRole:
Label:
en: OOSAssumeRole
zh-cn: OOS扮演的RAM角色
Type: String
Default: ''
BucketName:
Label:
en: BucketName
zh-cn: 需要上傳snapshot的oss路徑
Type: String
OSSEndpoint:
Label:
en: OSSEndpoint
zh-cn: 需要上傳snapshot的oss對應的endpoint
Type: String
ClusterID:
Label:
en: ClusterID
zh-cn: 叢集的ID
Type: String
RamRole: '{{ OOSAssumeRole }}'
Tasks:
- Name: getInstance
Description:
en: Views the ECS instances
zh-cn: 擷取ECS執行個體
Action: ACS::SelectTargets
Properties:
ResourceType: ALIYUN::ECS::Instance
RegionId: '{{ regionId }}'
Filters:
- '{{ targets }}'
Outputs:
instanceIds:
Type: List
ValueSelector: Instances.Instance[].InstanceId
- Action: ACS::ECS::RunCommand
OnError: rollback
Description:
en: Execute cloud assistant command
zh-cn: 執行雲助手命令
Properties:
regionId: '{{ regionId }}'
commandContent: |-
#!/bin/bash
set -e
if [ "{{action}}" = "migrate" ]; then
mkdir -p /etc/kubernetes/manifests.backup
if_move=$(ls /etc/kubernetes/manifests/ | wc -l)
if [ "$if_move" != "0" ]; then
mv -f /etc/kubernetes/manifests/* /etc/kubernetes/manifests.backup/
fi
is_ok=0
set +e
ps -o cmd -p `pidof kubelet` | grep 'container-runtime-endpoint=/var/run/containerd/containerd.sock'
if [ $? -ne 0 ]; then
echo "容器運行時不為containerd"
for ((integer = 0; integer < 150; integer++)); do
count=$(docker ps | grep kube-apiserver | wc -l)
if [ "$count" = "0" ]; then
is_ok=1
break
else
sleep 2
fi
done
else
echo "容器運行時為containerd"
for ((integer = 0; integer < 150; integer++)); do
count=$(crictl --runtime-endpoint /var/run/containerd/containerd.sock ps |grep kube-apiserver | wc -l)
if [ "$count" = "0" ]; then
is_ok=1
break
else
sleep 2
fi
done
fi
set -e
if [ "$is_ok" == "0" ]; then
mv -f /etc/kubernetes/manifests.backup/* /etc/kubernetes/manifests/
echo "Rollback finish"
exit 1
else
echo "The control plane is sleeping now."
fi
elif [ "{{action}}" = "rollback" ]; then
mkdir -p /etc/kubernetes/manifests.backup
if_move=$(ls /etc/kubernetes/manifests.backup/ | wc -l)
if [ "$if_move" != "0" ]; then
mv -f /etc/kubernetes/manifests.backup/* /etc/kubernetes/manifests/
fi
echo "The control plane is wakeup now."
else
echo "action must be migrate or rollback"
exit 1
fi
instanceId: '{{ ACS::TaskLoopItem }}'
commandType: RunShellScript
workingDir: '{{ workingDir }}'
timeout: 240
Loop:
Items: '{{ getInstance.instanceIds }}'
RateControl: '{{ rateControl }}'
Outputs:
commandOutputs:
AggregateType: Fn::ListJoin
AggregateField: commandOutput
Outputs:
commandOutput:
ValueSelector: invocationOutput
Type: String
Name: sleepOrWakeupControlPlane
- Action: ACS::ECS::RunCommand
OnError: rollback
Description:
en: Execute cloud assistant command
zh-cn: 執行雲助手命令
Properties:
regionId: '{{ regionId }}'
commandContent: |-
#!/bin/bash
set -e
if [ "{{action}}" = "rollback" ]; then
exit 0
fi
# 擷取eth0 IP
IP=$(/sbin/ifconfig eth0 | grep inet | grep -v 127.0.0.1 | grep -v inet6 | awk '{print $2}' | tr -d "addr:")
ENDPOINT="https://$IP:2379"
echo "ENDPOINT: "$ENDPOINT
set +e
# 查詢etcd endpoints status,判斷該etcd是否為leader
ETCDCTL_API=3 /usr/bin/etcdctl --cacert=/var/lib/etcd/cert/ca.pem --cert=/var/lib/etcd/cert/etcd-server.pem --key=/var/lib/etcd/cert/etcd-server-key.pem --endpoints=$ENDPOINT endpoint status | grep true
if [ $? -ne 0 ]; then
echo "不為etcd leader所在機器,退出。"
exit 0
fi
set -e
yum install curl wget jq -y
if [ ! -f "/tmp/ossutil64" ]; then
# wget ossutil,儲存到/tmp/目錄下
wget -c -t 10 -O /tmp/ossutil64 https://oos-public-{{regionId}}.oss-{{regionId}}-internal.aliyuncs.com/x64/ossutil64
if [ $? -ne 0 ]; then
echo "下載ossutil工具,退出。"
exit 1
fi
chmod +x /tmp/ossutil64
fi
if [ ! -f "/tmp/modify-prefix-v2" ]; then
echo "下載modify-prefix-v2"
wget -c -t 10 -O /tmp/modify-prefix-v2 https://aliacs-k8s-{{regionId}}.oss-{{regionId}}-internal.aliyuncs.com/public/pkg/etcd/modify-prefix-v2
if [ $? -ne 0 ]; then
echo "下載修改prefix工具出錯,退出。"
exit 1
fi
chmod +x /tmp/modify-prefix-v2
fi
if ! [[ {{ClusterID}} =~ ^c.* ]];then
echo "clusterID: {{ClusterID}}不是正確的叢集id,退出。"
exit 1
fi
echo "clusterID: {{ClusterID}}"
# 為leader則做snapshot,將snapshot存在在/tmp/
TIMESTAMP=$(date "+%Y%m%d%H%M%S")
mkdir -p /tmp/etcdsnap
set -x
SNAP_NAME=etcd_{{ClusterID}}_$TIMESTAMP
echo "開始備份,備份名為/tmp/"$SNAP_NAME
DestPrefix="/"{{ClusterID}}
ETCDCTL_API=3 /usr/bin/etcdctl --cacert=/var/lib/etcd/cert/ca.pem --cert=/var/lib/etcd/cert/etcd-server.pem --key=/var/lib/etcd/cert/etcd-server-key.pem --endpoints=$ENDPOINT snapshot save /tmp/etcdsnap/$SNAP_NAME
set +e
/tmp/modify-prefix-v2 change-prefix --db=/tmp/etcdsnap/$SNAP_NAME --dest-prefix=$DestPrefix
if [ $? -ne 0 ]; then
echo "修改prefix出錯,退出。"
exit 1
fi
set -e
# 擷取oss 相關地址,上傳
ROLE=$(curl -s 100.100.100.200/latest/meta-data/ram/security-credentials/)
ROLERES=$(curl -s 100.100.100.200/latest/meta-data/ram/security-credentials/$ROLE)
AccessKeyId=$(echo $ROLERES | jq .AccessKeyId|sed 's/\"//g')
AccessKeySecret=$(echo $ROLERES | jq .AccessKeySecret|sed 's/\"//g')
SecurityToken=$(echo $ROLERES | jq .SecurityToken|sed 's/\"//g')
# put object to oss
echo "begin put object to oss"
set +e
/tmp/ossutil64 -t $SecurityToken -i $AccessKeyId -k $AccessKeySecret -e {{OSSEndpoint}} cp /tmp/etcdsnap/$SNAP_NAME oss://{{BucketName}}/$SNAP_NAME
if [ $? -ne 0 ]; then
echo "推送資料到{{BucketName}} bucket失敗,退出。"
exit 1
fi
set -e
# sign
oss_url=$(/tmp/ossutil64 -t $SecurityToken -i $AccessKeyId -k $AccessKeySecret -e {{OSSEndpoint}} sign --timeout 2400 oss://{{BucketName}}/$SNAP_NAME | grep -v "elapsed" | tr -d '\n')
set +x
sakey=$(cat /etc/kubernetes/pki/sa.key | base64 -w0)
sapub=$(cat /etc/kubernetes/pki/sa.pub | base64 -w0)
frontcrt=$(cat /etc/kubernetes/pki/front-proxy-ca.crt | base64 -w0)
frontkey=$(cat /etc/kubernetes/pki/front-proxy-ca.key | base64 -w0)
echo "{\"sakey\":\"$sakey\",\"sapub\":\"$sapub\",\"frontcrt\":\"$frontcrt\",\"frontkey\":\"$frontkey\",\"oss_url\":\"$oss_url\"}" >/tmp/etcdsnap/sign
instanceId: '{{ ACS::TaskLoopItem }}'
commandType: RunShellScript
workingDir: '{{ workingDir }}'
timeout: 600
Loop:
Items: '{{ getInstance.instanceIds }}'
RateControl: '{{ rateControl }}'
Outputs:
commandOutputs:
AggregateType: Fn::ListJoin
AggregateField: commandOutput
Outputs:
commandOutput:
ValueSelector: invocationOutput
Type: String
Name: etcdCheckout
- Action: 'ACS::ECS::RunCommand'
OnError: rollback
Description:
En: Execute cloud assistant command
Zh-cn: 執行雲助手命令
Properties:
regionId: '{{ regionId }}'
commandContent: |-
#!/bin/bash
if [ "{{action}}" = "rollback" ]; then
exit 0
fi
if [ -e /tmp/etcdsnap/sign ]; then
curl --retry 10 -sSL 100.100.100.200/latest/meta-data/instance-id
fi
instanceId: '{{ ACS::TaskLoopItem }}'
commandType: RunShellScript
workingDir: '{{ workingDir }}'
timeout: 60
Loop:
Items: '{{ getInstance.instanceIds }}'
RateControl: '{{ rateControl }}'
Outputs:
commandOutputs:
AggregateType: 'Fn::ListJoin'
AggregateField: commandOutput
Outputs:
commandOutput:
ValueSelector: invocationOutput
Type: String
Name: findLeader
- Action: 'ACS::ECS::RunCommand'
OnError: rollback
OnSuccess: ACS::END
Description:
en: Execute cloud assistant command
zh-cn: 執行雲助手命令
Properties:
regionId: '{{ regionId }}'
commandContent: |-
#!/bin/bash
if [ "{{action}}" = "rollback" ]; then
exit 0
fi
if [ -e /tmp/etcdsnap/sign ]; then
cat /tmp/etcdsnap/sign
rm -rf /tmp/etcdsnap/sign
fi
instanceId: '{{ ACS::TaskLoopItem }}'
commandType: RunShellScript
workingDir: '{{ workingDir }}'
timeout: 60
Loop:
Items:
'Fn::Intersection':
- '{{ getInstance.instanceIds }}'
- '{{ findLeader.commandOutputs }}'
RateControl: '{{ rateControl }}'
Outputs:
commandOutputs:
AggregateType: Fn::ListJoin
AggregateField: commandOutput
Outputs:
commandOutput:
ValueSelector: invocationOutput
Type: String
Name: readSign
- Action: ACS::ECS::RunCommand
Description:
en: Execute cloud assistant command
zh-cn: 執行雲助手命令
Properties:
regionId: '{{ regionId }}'
commandContent: |-
#!/bin/bash
set -e
mkdir -p /etc/kubernetes/manifests.backup
if_move=$(ls /etc/kubernetes/manifests.backup/ | wc -l)
if [ "$if_move" != "0" ]; then
mv -f /etc/kubernetes/manifests.backup/* /etc/kubernetes/manifests/
fi
echo "The control plane is wakeup now."
instanceId: '{{ ACS::TaskLoopItem }}'
commandType: RunShellScript
workingDir: '{{ workingDir }}'
timeout: 240
Loop:
Items: '{{ getInstance.instanceIds }}'
RateControl: '{{ rateControl }}'
Outputs:
commandOutputs:
AggregateType: Fn::ListJoin
AggregateField: commandOutput
Outputs:
commandOutput:
ValueSelector: invocationOutput
Type: String
Name: rollback
Outputs:
sleepOrWakeupControlPlaneOutputs:
Type: List
Value: '{{ sleepOrWakeupControlPlane.commandOutputs }}'
etcdCheckoutOutputs:
Type: List
Value: '{{ etcdCheckout.commandOutputs }}'
findLeaderOutputs:
Type: List
Value: '{{ findLeader.commandOutputs }}'
readSignOutputs:
Type: List
Value: '{{ readSign.commandOutputs }}'
Metadata:
ALIYUN::OOS::Interface:
ParameterGroups:
- Parameters:
- ClusterID
- action
- BucketName
- OSSEndpoint
- workingDir
Label:
default:
zh-cn: 配置參數
en: Configure Parameters
- Parameters:
- regionId
- targets
Label:
default:
zh-cn: 選擇執行個體
en: Select ECS Instance
- Parameters:
- rateControl
- OOSAssumeRole
Label:
default:
zh-cn: 進階選項
en: Control Options