版本管理
版本 | 修订日期 | 变更类型 | 生效日期 |
1.0 | 2019/4/15 | ||
1.1 | 2019/7/30 | 1.更新故障计数描述 2.更新启停顺序的说明 | 2019/7/30 |
SAP高可用环境维护概述
本文档适用基于SUSE HAE 12集群部署的SAP系统应用或SAP HANA ECS实例需要进行运维操作的场景,例如ECS实例规格升降配、SAP应用 / 数据库升级、主/备节点的常规维护、节点发生异常切换等场景的前置和后处理说明。
通过SUSE HAE管理的SAP系统,如果要在集群节点上执行维护任务,可能需要停止该节点上运行的资源、移动这些资源,或者关闭或重启该节点。此外,可能还需要暂时接管集群中资源的控制权。
下面列举的场景以SAP HANA高可用为例,SAP应用高可用维护操作类似。
本文档无法代替标准的SUSE和SAP的安装/管理文档,更多高可用环境维护指导请参考SUSE和SAP的官方文档。
SUSE HAE操作手册请参考:
SAP HANA HSR配置手册请参考:
SAP HANA高可用常见维护场景
SUSE Pacemaker提供了多种选项用于不同需求的维护需求:
将集群设置为维护模式
使用全局集群属性 maintenance-mode 可以一次性将所有资源置于维护状态。集群将停止监控这些资源。
将节点设置为维护模式
一次性将指定节点上运行的所有资源置于维护状态。集群将停止监控这些资源。
将节点设置为待机模式
处于待机模式的节点不再能够运行资源。该节点上运行的所有资源将被移出或停止(如果没有其他节点可用于运行资源)。另外,该节点上的所有监控操作将会停止(设置了role=”Stopped” 的操作除外)。
如果您需要停止集群中的某个节点,同时继续提供另一个节点上运行的服务,则可以使用此选项。
将资源设置为维护模式
将某个资源设置成此模式后,将不会针对该资源触发监控操作。如果您需要手动调整此资源所管理的服务,并且不希望集群在此期间对该资源运行任何监控操作,则可以使用此选项。
将资源设置为不受管理模式
使用 is-managed 属性可以暂时“释放”某个资源,使其不受集群堆栈的管理。这意味着,您可以手动调整此资源管理的服务。不过,集群将继续监控该资源,并会报告错误的信息。如果您希望集群同时停止监控该资源,请改为使用按资源维护模式。
1.主节点异常后处理
主节点异常时,HAE会触发主备切换,原备节点Node B会被promote为primary,但原主节点Node A仍然是primary角色,因此在原主节点Node A故障修复后启动Pacemaker服务前,需要手工重新配置HANA HSR,将原主节点Node A注册为Secondary。
本示例初始状态的主节点为saphana-01,备节点为saphana-02。
1.1 查询SUSE HAE的正常状态
登录任意节点,使用crm status
命令查询HAE的正常状态。
# crm status
Stack: corosync
Current DC: saphana-01 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Apr 15 14:33:22 2019
Last change: Mon Apr 15 14:33:19 2019 by root via crm_attribute on saphana-01
2 nodes configured
6 resources configured
Online: [ saphana-01 saphana-02 ]
Full list of resources:
rsc_sbd (stonith:external/sbd): Started saphana-01
rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-01
Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
Masters: [ saphana-01 ]
Slaves: [ saphana-02 ]
Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
Started: [ saphana-01 saphana-02 ]
主节点出现异常后,HAE自动将备节点promote成primary。
# crm status
Stack: corosync
Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Apr 15 14:40:43 2019
Last change: Mon Apr 15 14:40:41 2019 by root via crm_attribute on saphana-02
2 nodes configured
6 resources configured
Online: [ saphana-02 ]
OFFLINE: [ saphana-01 ]
Full list of resources:
rsc_sbd (stonith:external/sbd): Started saphana-02
rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-02
Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
Masters: [ saphana-02 ]
Stopped: [ saphana-01 ]
Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
Started: [ saphana-02 ]
Stopped: [ saphana-01 ]
1.2 重新注册HSR,修复原主节点故障
重新配置HSR之前,一定要先确认主、备节点,配置错误可能会导致数据被覆盖甚至丢失。
用SAP HANA实例用户,登录原主节点,配置HSR。
h01adm@saphana-01:/usr/sap/H01/HDB00> hdbnsutil -sr_register --remoteHost=saphana-02 --remoteInstance=00 --replicationMode=syncmem --name=saphana-01 --operationMode=logreplay
adding site ...
checking for inactive nameserver ...
nameserver saphana-01:30001 not responding.
collecting information ...
updating local ini files ...
done.
1.3 检查SBD状态
如果发现节点槽的状态不是 “clear”,需要将其设置为 “clear”。
# sbd -d /dev/vdc list
0 saphana-01 reset saphana-02
1 saphana-02 reset saphana-01
# sbd -d /dev/vdc message saphana-01 clear
# sbd -d /dev/vdc message saphana-02 clear
# sbd -d /dev/vdc list
0 saphana-01 clear saphana-01
1 saphana-02 clear saphana-01
1.4 启动pacemaker服务
执行以下命令启动pacemaker服务。启动pacemaker服务后,HAE会自动拉起SAP HANA服务。
# systemctl start pacemaker
此时,原备节点成为新主节点,当前HAE状态如下:
# crm status
Stack: corosync
Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Apr 15 15:10:58 2019
Last change: Mon Apr 15 15:09:56 2019 by root via crm_attribute on saphana-02
2 nodes configured
6 resources configured
Online: [ saphana-01 saphana-02 ]
Full list of resources:
rsc_sbd (stonith:external/sbd): Started saphana-02
rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-02
Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
Masters: [ saphana-02 ]
Slaves: [ saphana-01 ]
Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
Started: [ saphana-01 saphana-02 ]
1.5 检查SAP HANA HSR状态
通过SAP HANA自带python脚本检查
使用SAP HANA实例用户登录主节点,确保所有SAP HANA进程Replication Status都是ACTIVE。
saphana-02:~ # su - h01adm h01adm@saphana-02:/usr/sap/H01/HDB00> cdpy h01adm@saphana-02:/usr/sap/H01/HDB00/exe/python_support> python systemReplicationStatus.py | Database | Host | Port | Service Name | Volume ID | Site ID | Site Name | Secondary | Secondary | Secondary | Secondary | Secondary | Replication | Replication | Replication | | | | | | | | | Host | Port | Site ID | Site Name | Active Status | Mode | Status | Status Details | | -------- | ---------- | ----- | ------------ | --------- | ------- | ---------- | ---------- | --------- | --------- | ---------- | ------------- | ----------- | ----------- | -------------- | | SYSTEMDB | saphana-02 | 30001 | nameserver | 1 | 2 | saphana-02 | saphana-01 | 30001 | 1 | saphana-01 | YES | SYNCMEM | ACTIVE | | | H01 | saphana-02 | 30007 | xsengine | 3 | 2 | saphana-02 | saphana-01 | 30007 | 1 | saphana-01 | YES | SYNCMEM | ACTIVE | | | H01 | saphana-02 | 30003 | indexserver | 2 | 2 | saphana-02 | saphana-01 | 30003 | 1 | saphana-01 | YES | SYNCMEM | ACTIVE | | status system replication site "1": ACTIVE overall system replication status: ACTIVE Local System Replication State ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ mode: PRIMARY site id: 2 site name: saphana-02
通过SUSE提供的SAPHanaSR工具,查看复制状态,确保备节点的 sync_state为SOK。
saphana-02:~ # SAPHanaSR-showAttr Global cib-time -------------------------------- global Mon Apr 15 15:17:12 2019 Hosts clone_state lpa_h01_lpt node_state op_mode remoteHost roles site srmode standby sync_state version vhost ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- saphana-01 DEMOTED 30 online logreplay saphana-02 4:S:master1:master:worker:master saphana-01 syncmem SOK 2.00.020.00.1500920972 saphana-01 saphana-02 PROMOTED 1555312632 online logreplay saphana-01 4:P:master1:master:worker:master saphana-02 syncmem off PRIM 2.00.020.00.1500920972 saphana-02
1.6 (可选)重置故障计数
如果资源失败,它将自动重新启动,但是每次失败都会增加资源的故障计数。如果为该资源设置了migration-threshold,当故障数量达到阈值前,节点将不再允许运行该资源,因此我们需要手工清理这个故障计数。
清理故障计数的命令如下:
# crm resource cleanup [resouce name] [node]
例如:节点saphana-01的rsc_SAPHana_HDB的资源已经被修复,这时我们需要cleanup这个监控报警,命令如下:
crm resource cleanup rsc_SAPHana_HDB saphana-01
2.备节点异常后处理
备节点异常时,主节点不受任何影响,不会触发主备切换动作。当备节点故障恢复后,启动pacemaker服务,会自动拉起SAP HANA服务,主备角色不会发生变化,无需人工干预。
本示例初始状态的主节点为saphana-02,备节点为saphana-01。
2.1 查询HAE的正常状态
以SUSE HAE的正常状态登录任意节点,使用crm status
命令查询HAE的正常状态。
# crm status
Stack: corosync
Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Apr 15 15:34:52 2019
Last change: Mon Apr 15 15:33:50 2019 by root via crm_attribute on saphana-02
2 nodes configured
6 resources configured
Online: [ saphana-01 saphana-02 ]
Full list of resources:
rsc_sbd (stonith:external/sbd): Started saphana-02
rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-02
Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
Masters: [ saphana-02 ]
Slaves: [ saphana-01 ]
Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
Started: [ saphana-01 saphana-02 ]
2.2 重启pacemaker
备节点故障恢复后,先检查SBD,再重启pacemaker。
# systemctl start pacemaker
HSR保持原主备关系,当前HAE状态如下:
# crm status
Stack: corosync
Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Apr 15 15:43:28 2019
Last change: Mon Apr 15 15:43:25 2019 by root via crm_attribute on saphana-01
2 nodes configured
6 resources configured
Online: [ saphana-01 saphana-02 ]
Full list of resources:
rsc_sbd (stonith:external/sbd): Started saphana-02
rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-02
Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
Masters: [ saphana-02 ]
Slaves: [ saphana-01 ]
Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
Started: [ saphana-01 saphana-02 ]
2.3 检查SAP HANA HSR状态
详细操作,请参见1.5 检查SAP HANA HSR状态。
2.4 重置故障计数(可选)
3.主备节点停机维护
将集群设置为维护模式,依次关停备和主节点。
本示例初始状态的主节点为saphana-02,备节点为saphana-01。
3.1 查询HAE的正常状态
以SUSE HAE的正常状态登录任意节点,使用crm status
命令查询HAE的正常状态。
# crm status
Stack: corosync
Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Apr 15 15:34:52 2019
Last change: Mon Apr 15 15:33:50 2019 by root via crm_attribute on saphana-02
2 nodes configured
6 resources configured
Online: [ saphana-01 saphana-02 ]
Full list of resources:
rsc_sbd (stonith:external/sbd): Started saphana-02
rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-02
Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
Masters: [ saphana-02 ]
Slaves: [ saphana-01 ]
Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
Started: [ saphana-01 saphana-02 ]
3.2 将集群和master/slave资源集设置为维护模式
登录主节点,设置集群为维护模式。
# crm configure property maintenance-mode=true
将master/slave资源集设置为维护模式,本示例master/slave资源集为rsc_SAPHana_HDB和rsc_SAPHanaTopology_HDB。
# crm resource maintenance rsc_SAPHana_HDB true
Performing update of 'maintenance' on 'msl_SAPHana_HDB', the parent of 'rsc_SAPHana_HDB'
Set 'msl_SAPHana_HDB' option: id=msl_SAPHana_HDB-meta_attributes-maintenance name=maintenance=true
# crm resource maintenance rsc_SAPHanaTopology_HDB true
Performing update of 'maintenance' on 'cln_SAPHanaTopology_HDB', the parent of 'rsc_SAPHanaTopology_HDB'
Set 'cln_SAPHanaTopology_HDB' option: id=cln_SAPHanaTopology_HDB-meta_attributes-maintenance name=maintenance=true
当前HAE的状态如下:
# crm status
Stack: corosync
Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Apr 15 16:02:13 2019
Last change: Mon Apr 15 16:02:11 2019 by root via crm_resource on saphana-02
2 nodes configured
6 resources configured
*** Resource management is DISABLED ***
The cluster will not attempt to start, stop or recover services
Online: [ saphana-01 saphana-02 ]
Full list of resources:
rsc_sbd (stonith:external/sbd): Started saphana-02 (unmanaged)
rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-02 (unmanaged)
Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB] (unmanaged)
rsc_SAPHana_HDB (ocf::suse:SAPHana): Slave saphana-01 (unmanaged)
rsc_SAPHana_HDB (ocf::suse:SAPHana): Master saphana-02 (unmanaged)
Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB] (unmanaged)
rsc_SAPHanaTopology_HDB (ocf::suse:SAPHanaTopology): Started saphana-01 (unmanaged)
rsc_SAPHanaTopology_HDB (ocf::suse:SAPHanaTopology): Started saphana-02 (unmanaged)
3.3 停止备-主节点SAP HANA服务并关停ECS
用SAP HANA实例用户登录两个节点,先停备节点SAP HANA服务,再停主节点SAP HANA服务。
saphana-01:~ # su - h01adm
h01adm@saphana-01:/usr/sap/H01/HDB00> HDB stop
hdbdaemon will wait maximal 300 seconds for NewDB services finishing.
Stopping instance using: /usr/sap/H01/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function Stop 400
15.04.2019 16:46:42
Stop
OK
Waiting for stopped instance using: /usr/sap/H01/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function WaitforStopped 600 2
15.04.2019 16:46:54
WaitforStopped
OK
hdbdaemon is stopped.
saphana-02:~ # su - h01adm
h01adm@saphana-02:/usr/sap/H01/HDB00> HDB stop
hdbdaemon will wait maximal 300 seconds for NewDB services finishing.
Stopping instance using: /usr/sap/H01/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function Stop 400
15.04.2019 16:47:05
Stop
OK
Waiting for stopped instance using: /usr/sap/H01/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function WaitforStopped 600 2
15.04.2019 16:47:35
WaitforStopped
OK
hdbdaemon is stopped.
3.4 启动SAP HANA ECS主备节点,并将集群和资源集恢复为正常模式
依次登录主和备节点,执行以下命令启动pacemaker服务。
# systemctl start pacemaker
将集群和资源集恢复为正常模式。
# crm configure property maintenance-mode=false
# crm resource maintenance rsc_SAPHana_HDB false
Performing update of 'maintenance' on 'msl_SAPHana_HDB', the parent of 'rsc_SAPHana_HDB'
Set 'msl_SAPHana_HDB' option: id=msl_SAPHana_HDB-meta_attributes-maintenance name=maintenance=false
# crm resource maintenance rsc_SAPHanaTopology_HDB false
Performing update of 'maintenance' on 'cln_SAPHanaTopology_HDB', the parent of 'rsc_SAPHanaTopology_HDB'
Set 'cln_SAPHanaTopology_HDB' option: id=cln_SAPHanaTopology_HDB-meta_attributes-maintenance name=maintenance=false
SUSE HAE集群会自动将主备节点的SAP HANA服务拉起,并保持原主备角色不变。
当前HAE状态如下:
# crm status
Stack: corosync
Current DC: saphana-01 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Apr 15 16:56:49 2019
Last change: Mon Apr 15 16:56:43 2019 by root via crm_attribute on saphana-01
2 nodes configured
6 resources configured
Online: [ saphana-01 saphana-02 ]
Full list of resources:
rsc_sbd (stonith:external/sbd): Started saphana-01
rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-02
Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
Masters: [ saphana-02 ]
Slaves: [ saphana-01 ]
Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
Started: [ saphana-01 saphana-02 ]
3.5 检查SAP HANA HSR状态
详细操作,请参见1.5 检查SAP HANA HSR状态。
3.6 重置故障计数(可选)
详细操作,请参见1.6 (可选)重置故障计数。
4.主节点停机维护
主节点将被设置为standby模式,集群将触发切换。
本示例初始状态的主节点为saphana-02,备节点为saphana-01。
4.1 查询SUSE HAE的正常状态
登录任意节点,使用crm status
命令查询HAE的正常状态。
# crm status
Stack: corosync
Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Apr 15 15:34:52 2019
Last change: Mon Apr 15 15:33:50 2019 by root via crm_attribute on saphana-02
2 nodes configured
6 resources configured
Online: [ saphana-01 saphana-02 ]
Full list of resources:
rsc_sbd (stonith:external/sbd): Started saphana-02
rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-02
Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
Masters: [ saphana-02 ]
Slaves: [ saphana-01 ]
Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
Started: [ saphana-01 saphana-02 ]
4.2 将主节点设置standby模式
本示例主节点是saphana-02。
# crm node standby saphana-02
集群会停掉saphana-02节点的SAP HANA,并将saphana-01节点的SAP HANA设置为主节点。
当前HAE的状态如下:
# crm status
Stack: corosync
Current DC: saphana-01 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Apr 15 17:07:56 2019
Last change: Mon Apr 15 17:07:38 2019 by root via crm_attribute on saphana-02
2 nodes configured
6 resources configured
Node saphana-02: standby
Online: [ saphana-01 ]
Full list of resources:
rsc_sbd (stonith:external/sbd): Started saphana-01
rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-01
Clone Set: msl_SAPHana_HDB [rsc_SAPHana_HDB] (promotable)
Masters: [ saphana-01 ]
Stopped: [ saphana-02 ]
Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
Started: [ saphana-01 ]
Stopped: [ saphana-02 ]
4.3 关停ECS,执行停机维护任务
4.4 启动维护节点,重新注册HSR
登录被维护节点,注册HSR。
# hdbnsutil -sr_register --remoteHost=saphana-01 --remoteInstance=00 --replicationMode=syncmem --name=saphana-02 --operationMode=logreplay
4.5 启动pacemaker服务,并将standby节点恢复成online模式
# systemctl start pacemaker
# crm node online saphana-02
SUSE HAE集群会自动将备节点的SAP HANA服务拉起。
当前HAE状态如下:
# crm status
Stack: corosync
Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Apr 15 18:02:33 2019
Last change: Mon Apr 15 18:01:31 2019 by root via crm_attribute on saphana-02
2 nodes configured
6 resources configured
Online: [ saphana-01 saphana-02 ]
Full list of resources:
rsc_sbd (stonith:external/sbd): Started saphana-01
rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-01
Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
Masters: [ saphana-01 ]
Slaves: [ saphana-02 ]
Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
Started: [ saphana-01 saphana-02 ]
4.6 检查SAP HANA HSR状态
详细操作,请参见1.5 检查SAP HANA HSR状态。
4.7 重置故障计数(可选)
详细操作,请参见1.6 (可选)重置故障计数。
5.备节点停机维护
将备节点设置为维护模式。
本示例初始状态的主节点为saphana-02,备节点为saphana-01。
5.1 查询HAE的正常状态。
SUSE HAE的正常状态登录任意节点,使用crm status
命令查询HAE的正常状态。
# crm status
Stack: corosync
Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Apr 15 15:34:52 2019
Last change: Mon Apr 15 15:33:50 2019 by root via crm_attribute on saphana-02
2 nodes configured
6 resources configured
Online: [ saphana-01 saphana-02 ]
Full list of resources:
rsc_sbd (stonith:external/sbd): Started saphana-02
rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-02
Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
Masters: [ saphana-02 ]
Slaves: [ saphana-01 ]
Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
Started: [ saphana-01 saphana-02 ]
5.2 将备节点设为维护模式
# crm node maintenance saphana-01
设置生效后,HAE状态如下:
Stack: corosync
Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Apr 15 18:18:10 2019
Last change: Mon Apr 15 18:17:49 2019 by root via crm_attribute on saphana-01
2 nodes configured
6 resources configured
Node saphana-01: maintenance
Online: [ saphana-02 ]
Full list of resources:
rsc_sbd (stonith:external/sbd): Started saphana-02
rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-02
Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
rsc_SAPHana_HDB (ocf::suse:SAPHana): Slave saphana-01 (unmanaged)
Masters: [ saphana-02 ]
Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
rsc_SAPHanaTopology_HDB (ocf::suse:SAPHanaTopology): Started saphana-01 (unmanaged)
Started: [ saphana-02 ]
5.3 停止备节点SAP HANA服务,关停ECS进行停机维护任务
用SAP HANA实例用户登录备节点,停止SAP HANA服务。
saphana-01:~ # su - h01adm
h01adm@saphana-01:/usr/sap/H01/HDB00> HDB stop
hdbdaemon will wait maximal 300 seconds for NewDB services finishing.
Stopping instance using: /usr/sap/H01/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function Stop 400
15.04.2019 16:47:05
Stop
OK
Waiting for stopped instance using: /usr/sap/H01/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function WaitforStopped 600 2
15.04.2019 16:47:35
WaitforStopped
OK
hdbdaemon is stopped.
5.4 启动SAP HANA ECS备节点,并将节点恢复为正常模式
登录备节点,启动pacemaker服务。
# systemctl start pacemaker
将备节点恢复为正常模式。
saphana-02:~ # crm node ready saphana-01
SUSE HAE集群会自动将备节点的SAP HANA服务拉起,并保持原主备角色不变。
当前HAE状态如下:
# crm status
Stack: corosync
Current DC: saphana-02 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Mon Apr 15 18:02:33 2019
Last change: Mon Apr 15 18:01:31 2019 by root via crm_attribute on saphana-02
2 nodes configured
6 resources configured
Online: [ saphana-01 saphana-02 ]
Full list of resources:
rsc_sbd (stonith:external/sbd): Started saphana-02
rsc_vip (ocf::heartbeat:IPaddr2): Started saphana-02
Master/Slave Set: msl_SAPHana_HDB [rsc_SAPHana_HDB]
Masters: [ saphana-02 ]
Slaves: [ saphana-01 ]
Clone Set: cln_SAPHanaTopology_HDB [rsc_SAPHanaTopology_HDB]
Started: [ saphana-01 saphana-02 ]
5.5 检查SAP HANA HSR状态
详细操作,请参见1.5 检查SAP HANA HSR状态。
5.6 重置故障计数(可选)
详细操作,请参见1.6 (可选)重置故障计数。