全部產品
Search
文件中心

E-MapReduce:更換叢集損壞的本地碟

更新時間:Jul 01, 2024

使用由本地碟機型(i系列和d系列)構建的E-MapReduce(簡稱EMR)叢集時,您可能會收到本地碟受損事件的通知。本文為您介紹如何更換叢集中損壞的本地碟。

注意事項

  • 建議您使用縮減異常節點並增加新節點的方法來解決此類問題,以避免對業務運行造成長時間的影響。

  • 磁碟更換後,該磁碟上的資料會丟失,請確保磁碟上的資料有足夠的副本,並及時備份。

  • 整個換盤包括服務停止、卸載磁碟、掛載新盤和服務重啟等操作,磁碟的更換通常在五個工作日內完成。執行本文檔前請評估服務停止以後,服務的磁碟水位以及叢集負載能否承載當前的業務。

操作步驟

您可以登入ECS控制台,查看事件具體資訊,包括執行個體ID、狀態、受損磁碟ID、事件進度和相關的操作。

步驟一:擷取損壞的磁碟資訊

  1. 通過SSH方式登入壞盤所在節點,詳情請參見登入叢集

  2. 執行以下命令,查看塊裝置資訊。

    lsblk

    返回如下類似資訊。

    NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
    vdd    254:48   0  5.4T  0 disk /mnt/disk3
    vdb    254:16   0  5.4T  0 disk /mnt/disk1
    vde    254:64   0  5.4T  0 disk /mnt/disk4
    vdc    254:32   0  5.4T  0 disk /mnt/disk2
    vda    254:0    0  120G  0 disk
    └─vda1 254:1    0  120G  0 part /
  3. 執行以下命令,查看磁碟資訊。

    sudo fdisk -l

    返回如下類似資訊。

    Disk /dev/vdd: 5905.6 GB, 5905580032000 bytes, 11534336000 sectors
    Units = sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 4096 bytes
    I/O size (minimum/optimal): 4096 bytes / 4096 bytes
  4. 根據前面兩個步驟的返回資訊記錄裝置名稱$device_name和掛載點$mount_path

    例如,壞盤事件中的裝置為vdd,擷取到的裝置名稱為/dev/vdd,掛載點為/mnt/disk3

步驟二:隔離損壞的本地碟

  1. 停止對該壞盤有讀寫操作的應用。

    EMR控制台上單擊壞盤所在叢集,在叢集服務頁簽找到對該壞盤有讀寫操作的EMR服務,通常包括HDFS、HBase和Kudu等儲存類服務,選擇目標服務地區的more > 停止完成服務停止操作。

    您也可以在該節點通過sudo fuser -mv $device_name命令查看佔用磁碟的完整進程列表,並在EMR控制台停止列表中的服務。

  2. 執行以下命令,對本地碟設定應用程式層讀寫隔離。

    sudo chmod 000 $mount_path
  3. 執行以下命令,取消掛載本地碟。

    sudo umount $device_name;sudo chmod 000 $mount_path
    重要

    如果不執行取消掛載操作,在壞盤維修完成並恢複隔離後,該本地碟的對應裝置名稱會發生變化,可能導致應用讀寫錯誤的磁碟。

  4. 更新fstab檔案。

    1. 備份已有的/etc/fstab檔案。

    2. 刪除/etc/fstab檔案中對應磁碟的記錄。

      例如,本文樣本中壞掉的磁碟是dev/vdd,所以需要刪除該磁碟對應的記錄。disk

  5. 啟動已停止的應用。

    在壞盤所在叢集的叢集服務頁簽找到步驟二中停止的EMR服務,選擇目標服務地區的more > 啟動來啟動目標服務。

步驟三:換盤操作

ECS控制台上修複磁碟,詳情請參見隔離損壞的本地碟

步驟四:掛載磁碟

磁碟修複完成後,需要重新掛載磁碟,便於使用新磁碟。

  1. 執行以下命令,統一裝置名稱。

    device_name=`echo "$device_name" | sed 's/x//1'`

    上述命令可以將類似/dev/xvdk類的目錄名歸一化,去掉x,修改為/dev/vdk

  2. 執行以下命令,建立掛載目錄。

     mkdir -p "$mount_path"
  3. 執行以下命令,掛載磁碟。

    mount $device_name $mount_path;sudo chmod 755 $mount_path

    如果掛載磁碟失敗,則可以按照以下步驟操作:

    1. 執行以下命令,格式化磁碟。

      fdisk $device_name << EOF
      n
      p
      1
      
      wq
      EOF
    2. 執行以下命令,重新掛載磁碟。

      mount $device_name $mount_path;sudo chmod 755 $mount_path
  4. 執行以下命令,修改fstab檔案。

    echo "$device_name $mount_path $fstype defaults,noatime,nofail 0 0" >> /etc/fstab
    說明

    可以通過which mkfs.ext4命令,確認ext4是否存在,存在的話$fstype為ext4,否則$fstype為ext3。

  5. 建立指令檔並根據叢集類型選擇相應指令碼代碼。

    DataLake、DataFlow、OLAP、DataServing和Custom叢集

    while getopts p: opt
    do
    	case "${opt}" in
      	p) mount_path=${OPTARG};;
      esac
    done
    
    sudo mkdir -p $mount_path/flink
    sudo chown flink:hadoop $mount_path/flink
    sudo chmod 775 $mount_path/flink
    
    sudo mkdir -p $mount_path/hadoop
    sudo chown hadoop:hadoop $mount_path/hadoop
    sudo chmod 755 $mount_path/hadoop
    
    sudo mkdir -p $mount_path/hdfs
    sudo chown hdfs:hadoop $mount_path/hdfs
    sudo chmod 750 $mount_path/hdfs
    
    sudo mkdir -p $mount_path/yarn
    sudo chown root:root $mount_path/yarn
    sudo chmod 755 $mount_path/yarn
    
    sudo mkdir -p $mount_path/impala
    sudo chown impala:hadoop $mount_path/impala
    sudo chmod 755 $mount_path/impala
    
    sudo mkdir -p $mount_path/jindodata
    sudo chown root:root $mount_path/jindodata
    sudo chmod 755 $mount_path/jindodata
    
    sudo mkdir -p $mount_path/jindosdk
    sudo chown root:root $mount_path/jindosdk
    sudo chmod 755 $mount_path/jindosdk
    
    sudo mkdir -p $mount_path/kafka
    sudo chown root:root $mount_path/kafka
    sudo chmod 755 $mount_path/kafka
    
    sudo mkdir -p $mount_path/kudu
    sudo chown root:root $mount_path/kudu
    sudo chmod 755 $mount_path/kudu
    
    sudo mkdir -p $mount_path/mapred
    sudo chown root:root $mount_path/mapred
    sudo chmod 755 $mount_path/mapred
    
    sudo mkdir -p $mount_path/starrocks
    sudo chown root:root $mount_path/starrocks
    sudo chmod 755 $mount_path/starrocks
    
    sudo mkdir -p $mount_path/clickhouse
    sudo chown clickhouse:clickhouse $mount_path/clickhouse
    sudo chmod 755 $mount_path/clickhouse
    
    sudo mkdir -p $mount_path/doris
    sudo chown root:root $mount_path/doris
    sudo chmod 755 $mount_path/doris
    
    sudo mkdir -p $mount_path/log
    sudo chown root:root $mount_path/log
    sudo chmod 755 $mount_path/log
    
    sudo mkdir -p $mount_path/log/clickhouse
    sudo chown clickhouse:clickhouse $mount_path/log/clickhouse
    sudo chmod 755 $mount_path/log/clickhouse
    
    sudo mkdir -p $mount_path/log/kafka
    sudo chown kafka:hadoop $mount_path/log/kafka
    sudo chmod 755 $mount_path/log/kafka
    
    sudo mkdir -p $mount_path/log/kafka-rest-proxy
    sudo chown kafka:hadoop $mount_path/log/kafka-rest-proxy
    sudo chmod 755 $mount_path/log/kafka-rest-proxy
    
    sudo mkdir -p $mount_path/log/kafka-schema-registry
    sudo chown kafka:hadoop $mount_path/log/kafka-schema-registry
    sudo chmod 755 $mount_path/log/kafka-schema-registry
    
    sudo mkdir -p $mount_path/log/cruise-control
    sudo chown kafka:hadoop $mount_path/log/cruise-control
    sudo chmod 755 $mount_path/log/cruise-control
    
    sudo mkdir -p $mount_path/log/doris
    sudo chown doris:doris $mount_path/log/doris
    sudo chmod 755 $mount_path/log/doris
    
    sudo mkdir -p $mount_path/log/celeborn
    sudo chown hadoop:hadoop $mount_path/log/celeborn
    sudo chmod 755 $mount_path/log/celeborn
    
    sudo mkdir -p $mount_path/log/flink
    sudo chown flink:hadoop $mount_path/log/flink
    sudo chmod 775 $mount_path/log/flink
    
    sudo mkdir -p $mount_path/log/flume
    sudo chown root:root $mount_path/log/flume
    sudo chmod 755 $mount_path/log/flume
    
    sudo mkdir -p $mount_path/log/gmetric
    sudo chown root:root $mount_path/log/gmetric
    sudo chmod 777 $mount_path/log/gmetric
    
    sudo mkdir -p $mount_path/log/hadoop-hdfs
    sudo chown hdfs:hadoop $mount_path/log/hadoop-hdfs
    sudo chmod 755 $mount_path/log/hadoop-hdfs
    
    sudo mkdir -p $mount_path/log/hbase
    sudo chown hbase:hadoop $mount_path/log/hbase
    sudo chmod 755 $mount_path/log/hbase
    
    sudo mkdir -p $mount_path/log/hive
    sudo chown root:root $mount_path/log/hive
    sudo chmod 775 $mount_path/log/hive
    
    sudo mkdir -p $mount_path/log/impala
    sudo chown impala:hadoop $mount_path/log/impala
    sudo chmod 755 $mount_path/log/impala
    
    sudo mkdir -p $mount_path/log/jindodata
    sudo chown root:root $mount_path/log/jindodata
    sudo chmod 777 $mount_path/log/jindodata
    
    sudo mkdir -p $mount_path/log/jindosdk
    sudo chown root:root $mount_path/log/jindosdk
    sudo chmod 777 $mount_path/log/jindosdk
    
    sudo mkdir -p $mount_path/log/kyuubi
    sudo chown kyuubi:hadoop $mount_path/log/kyuubi
    sudo chmod 755 $mount_path/log/kyuubi
    
    sudo mkdir -p $mount_path/log/presto
    sudo chown presto:hadoop $mount_path/log/presto
    sudo chmod 755 $mount_path/log/presto
    
    sudo mkdir -p $mount_path/log/spark
    sudo chown spark:hadoop $mount_path/log/spark
    sudo chmod 755 $mount_path/log/spark
    
    sudo mkdir -p $mount_path/log/sssd
    sudo chown sssd:sssd $mount_path/log/sssd
    sudo chmod 750 $mount_path/log/sssd
    
    sudo mkdir -p $mount_path/log/starrocks
    sudo chown starrocks:starrocks $mount_path/log/starrocks
    sudo chmod 755 $mount_path/log/starrocks
    
    sudo mkdir -p $mount_path/log/taihao_exporter
    sudo chown taihao:taihao $mount_path/log/taihao_exporter
    sudo chmod 755 $mount_path/log/taihao_exporter
    
    sudo mkdir -p $mount_path/log/trino
    sudo chown trino:hadoop $mount_path/log/trino
    sudo chmod 755 $mount_path/log/trino
    
    sudo mkdir -p $mount_path/log/yarn
    sudo chown hadoop:hadoop $mount_path/log/yarn
    sudo chmod 755 $mount_path/log/yarn

    資料湖(Hadoop)叢集

    while getopts p: opt
    do
    	case "${opt}" in
      	p) mount_path=${OPTARG};;
      esac
    done
    
    mkdir -p $mount_path/data
    chown hdfs:hadoop $mount_path/data
    chmod 1777 $mount_path/data
    
    mkdir -p $mount_path/hadoop
    chown hadoop:hadoop $mount_path/hadoop
    chmod 775 $mount_path/hadoop
    
    mkdir -p $mount_path/hdfs
    chown hdfs:hadoop $mount_path/hdfs
    chmod 755 $mount_path/hdfs
    
    mkdir -p $mount_path/yarn
    chown hadoop:hadoop $mount_path/yarn
    chmod 755 $mount_path/yarn
    
    mkdir -p $mount_path/kudu/master
    chown kudu:hadoop $mount_path/kudu/master
    chmod 755 $mount_path/kudu/master
    
    mkdir -p $mount_path/kudu/tserver
    chown kudu:hadoop $mount_path/kudu/tserver
    chmod 755 $mount_path/kudu/tserver
    
    mkdir -p $mount_path/log
    chown hadoop:hadoop $mount_path/log
    chmod 775 $mount_path/log
    
    mkdir -p $mount_path/log/hadoop-hdfs
    chown hdfs:hadoop $mount_path/log/hadoop-hdfs
    chmod 775 $mount_path/log/hadoop-hdfs
    
    mkdir -p $mount_path/log/hadoop-yarn
    chown hadoop:hadoop $mount_path/log/hadoop-yarn
    chmod 755 $mount_path/log/hadoop-yarn
    
    mkdir -p $mount_path/log/hadoop-mapred
    chown hadoop:hadoop $mount_path/log/hadoop-mapred
    chmod 755 $mount_path/log/hadoop-mapred
    
    mkdir -p $mount_path/log/kudu
    chown kudu:hadoop $mount_path/log/kudu
    chmod 755 $mount_path/log/kudu
    
    mkdir -p $mount_path/run
    chown hadoop:hadoop $mount_path/run
    chmod 777 $mount_path/run
    
    mkdir -p $mount_path/tmp
    chown hadoop:hadoop $mount_path/tmp
    chmod 777 $mount_path/tmp
  6. 執行以下命令運行指令檔建立服務類別目錄並刪除指令碼,$file_path為指令檔路徑。

    chmod +x $file_path
    sudo $file_path -p $mount_path
    rm $file_path
  7. 使用新磁碟。

    在EMR控制台重啟在該節點上啟動並執行服務,並檢查磁碟是否正常使用。