設定ECI Pod的故障處理策略 - Elastic Container Instance

預設情況下，ECI Pod建立失敗後，系統會自動重試嘗試建立。如果您希望儘快得到建立結果以便及時處理故障，可以修改ECI Pod的故障處理策略。

配置說明

在虛擬節點上建立ECI Pod時，可能會因為庫存不足等原因導致Pod建立失敗，預設情況下，系統會自動進行重調度，嘗試重新建立Pod。您可以通過添加k8s.aliyun.com/eci-fail-strategy的Annotation來修改ECI Pod的故障處理策略，設定ECI Pod建立失敗後是否嘗試重新建立。

重要

Annotation請添加在Pod的metadata下，例如：建立Deployment時，Annotation需添加在spec>template>metadata下。
僅支援在建立ECI Pod時添加ECI相關Annotation來生效ECI功能，更新ECI Pod時添加或者修改ECI相關Annotation均不會生效。

k8s.aliyun.com/eci-fail-strategy的取值說明如下：

取值	說明	情境
fail-back	失敗自動回復。即Pod建立失敗後自動嘗試重新建立。此時，Pod會保持Pending狀態，直到建立成功變為Running狀態。	側重成功率，能夠接受Pod延遲交付。
fail-over	失敗轉移。效果等同於fail-back。	側重成功率，能夠接受Pod延遲交付。
fail-fast	快速失敗。Pod建立失敗後直接報錯。Pod顯示為ProviderFailed狀態，由上層編排決定是否重試，或者把Pod建立調度到普通節點。	側重效率，希望Pod快速交付，有完善的失敗處理邏輯。

說明

不推薦通過k8s.aliyun.com/eci-reschedule-enable設定重調度。

配置樣本

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
  labels:
    app: test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      name: nginx-test
      labels:
        app: nginx
        alibabacloud.com/eci: "true" 
      annotations:
        k8s.aliyun.com/eci-fail-strategy: "fail-fast"  #設定Pod建立失敗後直接報錯，不再重新建立
        k8s.aliyun.com/eci-use-specs: "ecs.c6.large"
    spec:
      containers:
      - name: nginx
        image: registry.cn-shanghai.aliyuncs.com/eci_open/nginx:1.14.2
        ports:
        - containerPort: 80

以上YAML樣本中，ECI Pod的故障處理策略為fail-fast。如果Pod長時間Pending，您可以查看Pod status.reason。

如果Pod status.reason為ContainerInstanceScheduleFailed，則表示ECI調度失敗。此時查看Pod status condition，通過ContainerInstanceCreated的reason和message可以確定具體原因，進而採取相應措施，例如修改指定的規格，設定多可用性區域等。更多資訊，請參見ContainerInstanceCreated。
如果Pod status.reason為空白（fail-fast一般不會出現該情況），可以查看Pod status condition，通過ContainerInstanceCreated的status確認調度狀態。
- 如果ContainerInstanceCreated為True，則表示ECI調度成功，是Sandbox建立異常。
- 如果ContainerInstanceCreated為False，且reason不是Creating，則表示ECI調度還未成功，需要繼續等待。

以庫存不足建立ECI Pod失敗為例，當Pod的故障處理策略為fail-fast時，Pod status condition為ContainerInstanceCreated的樣本如下：

說明

如果Pod的故障處理策略為fail-back，Pod建立失敗後系統會自動嘗試重調度。此時，Pod status.reason不會顯示ContainerInstanceScheduleFailed，您也可以查看Pod status condition，通過ContainerInstanceCreated的reason和message確定當前調度周期內調度失敗的原因。

{
    "conditions": [
        {
            "lastProbeTime": "2023-03-30T18:11:31Z",
            "lastTransitionTime": "2023-03-30T18:11:31Z",
            "message": "Create ECI failed because the specified instance is out of stock. %s",
            "reason": "ContainerGroup.NoStock",
            "status": "False",
            "type": "ContainerInstanceCreated"
        }
    ],
    "Reason":"ContainerInstanceScheduleFailed",
    "phase": "Pending"
}