Services or scripts may stop running due to program exceptions, instance restarts, or power outage. If the services or scripts fail to resume at the earliest opportunity, online business may suffer losses. You can use the Cloud Assistant plug-in ecs-tool-servicekeepalive
to quickly resume the interrupted services or scripts. This ensures service reliability and continuity.
Solution overview
The solution is implemented by using the systemd service provided by the Linux operating system. When you use the ecs-tool-servicekeepalive
plug-in, you need to only enter a command that can start a service or program. For example, enter the python /home/root/main.py
command. After the systemd service is activated, the plug-in automatically generates the systemd service configuration based on the startup command that you enter. This enables the service or script to automatically start without the need to configure the systemd service.
The systemd service is a Linux component and can be used to automatically manage services. For example, the systemd service can start a service or script on instance startup or restart a service after an unexpected stop. For more information, see systemd documentation.
Procedure
After services or programs are deployed, start the
ecs-tool-servicekeepalive
plug-in of Cloud Assistant as the root user.Run a service or script as the root user
sudo acs-plugin-manager --exec --plugin ecs-tool-servicekeepalive --params "start,'cmd'"
cmd: Replace the parameter with a command that starts a service or script. For example, you can enter the
/bin/bash /home/work/debug/debug.sh
command, which is used to run a script, or thepython /home/root/main.py
command, which is used to run a program.ImportantThe path of the script or program file must be a root path.
Run a service or script by specifying a username
sudo acs-plugin-manager --exec --plugin ecs-tool-servicekeepalive --params "start,execstart='cmd',user=user_name,group=group_name"
cmd: Replace the parameter with a service startup command. For example, the
/bin/bash /home/work/debug/debug.sh
command is used to run a script, or thepython /home/root/main.py
command is used to run a program.ImportantThe path of the script or program file must be a root path.
user_name: Replace the parameter with the username that you want to use to run the service. To view the created users, run the
cut -d: -f1 /etc/passwd
command.group_name: Replace the parameter with the name of a user group in which the service is running. To view the created user groups, run the
cut -d: -f1 /etc/group
command.
Run the following command to check whether automatic restoration is enabled for the service:
sudo acs-plugin-manager --exec --plugin ecs-tool-servicekeepalive --params "status"
If the configuration is successful, the response shown in the following figure is returned.
(Optional) To disable automatic restoration for a service or script, run the following command:
sudo acs-plugin-manager --exec --local --plugin ecs-tool-servicekeepalive --params "stop service_name"
service_name: Replace the parameter with the name of the service. You can obtain the service name displayed in the service_name column of the command output in Step 2.
Example
Prepare the environment.
Create the
/home/work/debug
folder and then create thedebug.sh
script in the folder. The script prints one line of log data per second to the specified log file.sudo mkdir -p /home/work/debug && \ sudo tee /home/work/debug/debug.sh > /dev/null << 'EOF' #!/bin/bash while true do sudo echo "$(date '+%Y-%m-%d %H:%M:%S') progress is alive" >> $1 sleep 1 done EOF
Run the
ps aux |grep debug.sh
command. The command output shows that the script was not running.Start the Cloud Assistant plug-in.
sudo acs-plugin-manager --exec --plugin ecs-tool-servicekeepalive --params "start,'/bin/bash /home/work/debug/debug.sh /home/work/debug/debug.log'"
Run the
ps aux | grep debug.sh
command. The command output shows that the script is running. The process number is 2572.Check whether the script can automatically resume.
Restart the ECS instance and check whether the script resumes as expected
Restart the Elastic Compute Service (ECS) instance in the ECS console. After the ECS instance is restored, log on to the instance and run the following command:
ps aux |grep debug.sh
The
debug.sh
process of the service is run as expected, and the process number is updated to 764, which indicates that the script is restarted.Kill the process and check whether the script resumes as expected
Run the following command to find the number of the
debug.sh
process.ps aux |grep debug.sh
The following output is displayed. The number of the
debug.sh
process is 2572.Run the following command to kill the
debug.sh
process:sudo date && kill -9 <Process number>
Run the following command. The command output shows that the
debug.sh
process is still running and the process number is updated to 4220, which indicates that the script is restarted.ps aux |grep debug.sh
References
As your business grows, the numbers of data requests and concurrency page views increase. You can deploy multiple ECS instances to implement zone-level disaster recovery to ensure data availability and continuity. For more information, see Deploy a highly available architecture.