Public images may have some known security vulnerabilities or configuration issues. Known issues of public images help you understand potential security risks and take corresponding measures to locate and resolve the issues at the earliest opportunity.
Known issues of Windows images
Windows Server 2022: The KB5034439 patch fails to be installed
A patch released by Microsoft in June 2022 causes RRAS issues on servers for which NAT is enabled
A patch released in January 2022 causes abnormal behavior on Windows Server domain controllers (DCs)
Windows Server 2012 R2: .NET Framework 3.5 fails to be installed
Known issues of Linux images
CentOS
Debian
Debian 9.6: Instances in the classic network have network configuration issues
Fedora CoreOS
The hostnames of instances created from Fedora CoreOS custom images do not take effect
openSUSE
openSUSE 15: Kernel updates may cause the system to freeze during startup
Red Hat Enterprise Linux
SUSE Linux Enterprise Server
Other issues
Known issues of Windows images
Specific features do not work as expected when a Windows operating system is used on an instance type that has 512 MB of memory
Problem description
When the Windows Server Version 2004 Datacenter 64-bit (Simplified Chinese, Without UI) operating system is used on an Elastic Compute Service (ECS) instance that has 512 MB of memory, issues occur. For example, the password configured during instance creation does not take effect, the instance password fails to be changed, and commands fail to be run.
Cause
Virtual memory cannot be allocated because paging file management is disabled. As a result, exceptions occur when programs are run.
Solution
Due to the small memory size of the problematic instance, you cannot attach Pre-installation Environment (PE) disks to the instance. You can neither log on to the instance due to the ineffective password that you configured during instance creation. Therefore, you can enable paging file management for the instance only by using Cloud Assistant.
You can use one of the following methods to run commands by using Cloud Assistant:
Use Session Manager to connect to the instance without a password and then run commands. For more information, see Connect to an instance by using Session Manager.
Use Cloud Assistant to send commands to the instance. For more information, see Send remote commands.
Run the following command to enable paging file management:
Wmic ComputerSystem set AutomaticManagedPagefile=True
NoteThe preceding command may fail to be run. Try multiple times until the command is run.
You can also run the
Wmic ComputerSystem get AutomaticManagedPagefile
command to check whether paging file management is enabled. The following command output indicates that paging file management is enabled:AutomaticManagedPagefile TRUE
Restart the instance for the changes to take effect.
Windows Server 2016: The operating system does not respond when a software installation package is run
Problem description
When a software installation package is downloaded and run within Windows Server 2016, the operating system does not respond.
Cause
For security reasons, the Windows operating system enables the Express settings by configuring the ProtectYourPC option during the Sysprep phase when the operating system is started. Then, the system carries the SmartScreen system process after the operating system starts. In most cases, the SmartScreen system process is used to protect the operating system from redirection to malicious websites and insecure downloads.
When you try to download or run a software installation package from the Internet, the web identifier carried by the package triggers the SmartScreen system process. The SmartScreen system process recognizes that the software originates from the Internet and may lack reputation information. As a result, the software is blocked by the SmartScreen system process.
Solutions
Use one of the following solutions to resolve the issue:
Unblock the software installation package
Select Unblock in the Properties dialog box of the software installation package.
Rerun the software installation package.
Disable SmartScreen Filter
Go to the
C:\Windows\System32
directory.Double-click the
SmartScreenSettings.exe
file.Select Don't do anything (turn off Windows SmartScreen) in the Windows SmartScreen dialog box. Then, click OK.
Rerun the software installation package.
Modify Group Policy settings
Open the Run dialog box, enter
gpedit.msc
, and then click OK.In the Local Group Policy Editor dialog box, choose Computer Configuration > Windows Settings > Security Settings > Local Policies > Security Options.
Find the User Account Control: Admin Approval Mode for the Built-in Administrator account option and right-click the Properties option.
On the Local Security Settings tab, select Enabled and click OK.
Restart the operating system for the configurations to take effect.
Rerun the software installation package.
Windows Server 2022: The KB5034439 patch fails to be installed
Problem description
The KB5034439 patch fails to be installed in the Windows Server 2022 operating system.
Cause
The KB5034439 patch is an update released by Microsoft in January 2024 and used to restore the environment. By default, the update repository for images is the Alibaba Cloud internal Windows Server Update Services (WSUS) server that does not provide the patch. If you configure Microsoft Windows Update as the update repository and trigger an environment update, the system can search for and install the patch, but the installation fails. The issue is as expected and does not affect normal use of the operating system. For more information, see KB5034439: Windows Recovery Environment update for Windows Server 2022: January 9, 2024.
A patch released by Microsoft in June 2022 causes RRAS issues on servers for which NAT is enabled
Problem description: According to an announcement from Microsoft on June 23, 2022, the installation of a security patch released by Microsoft in June 2022 may pose the following risks: A Windows server that is using the Routing and Remote Access Service (RRAS) might lose connection to the Internet, and devices that connect to the server might be unable to connect to the Internet.
Affected versions of Windows Server:
Windows Server 2022
Windows Server 2019
Windows Server 2016
Windows Server 2012 R2
Windows Server 2012
When you check for system updates for Windows Server 2012 R2 and Windows Server 2012, select Check for updates that is marked ①, as shown in the following figure. The update repository to which the ① option is linked is the Alibaba Cloud internal WSUS server. The update repository to which the ② option is linked is the official Microsoft Windows Update server. In particular cases, security updates may cause potential issues. To prevent this scenario, Alibaba Cloud checks the Windows security updates from Microsoft and releases only the updates that pass the check to the internal WSUS server.
Solution: The relevant patch has been removed from Alibaba Cloud WSUS. To prevent your Windows Server operating system from being affected by the issue, we recommend that you check whether the patch is installed on the operating system. Run one of the following commands based on the version of your operating system:
Windows Server 2012 R2: wmic qfe get hotfixid | find "5014738" Windows Server 2019: wmic qfe get hotfixid | find "5014692" Windows Server 2016: wmic qfe get hotfixid | find "5014702" Windows Server 2012: wmic qfe get hotfixid | find "5014747" Windows Server 2022: wmic qfe get hotfixid | find "5014678"
If the command output indicates that the patch is installed and you are experiencing RRAS issues on the Windows Server operating system, we recommend that you uninstall the patch to restore functionality to the Windows server. Run one of the following commands based on the version of your operating system to uninstall the patch:
Windows Server 2012 R2: wusa /uninstall /kb:5014738 Windows Server 2019: wusa /uninstall /kb:5014692 Windows Server 2016: wusa /uninstall /kb:5014702 Windows Server 2012: wusa /uninstall /kb:5014747 Windows Server 2022: wusa /uninstall /kb:5014678
NoteFor further updates and operational guidance on the issue, follow the instructions in the official Microsoft documentation. For more information, see RRAS Servers can lose connectivity if NAT is enabled on the public interface.
A patch released in January 2022 causes abnormal behavior on Windows Server domain controllers (DCs)
Problem description: According to an announcement from Microsoft on January 13, 2022, the installation of a security patch released by Microsoft in January 2022 may pose the following risks: Virtual machines in Hyper-V cannot start, Windows Server DCs cannot restart or fall into a restart loop, and IP security (IPSec) virtual private network (VPN) connections fail.
Affected versions of Windows Server:
Windows Server 2022
Windows Server, version 20H2
Windows Server 2019
Windows Server 2016
Windows Server 2012 R2
Windows Server 2012
When you check for system updates for Windows Server 2012 R2 and Windows Server 2012, select Check for updates that is marked ①, as shown in the following figure. The update repository to which the ① option is linked is the Alibaba Cloud internal WSUS server. The update repository to which the ② option is linked is the official Microsoft Windows Update server. In particular cases, security updates may cause potential issues. To prevent this scenario, Alibaba Cloud checks the Windows security updates from Microsoft and releases only the updates that pass the check to the internal WSUS server.
Solution: The relevant patch is removed from Alibaba Cloud WSUS. To prevent your Windows Server operating system from being affected by the issue, we recommend that you check whether the patch has been installed on your operating system. Run one of the following commands based on the version of your operating system:
Windows Server 2012 R2: wmic qfe get hotfixid | find "5009624" Windows Server 2019: wmic qfe get hotfixid | find "5009557" Windows Server 2016: wmic qfe get hotfixid | find "5009546" Windows Server 2012: wmic qfe get hotfixid | find "5009586" Windows Server 2022: wmic qfe get hotfixid | find "5009555"
If the patch is already installed on your operating system and the DCs cannot be used or the virtual machines cannot start, we recommend that you uninstall the patch to restore the operating system. Run one of the following commands based on the version of your operating system to uninstall the patch:
Windows Server 2012 R2: wusa /uninstall /kb:5009624 Windows Server 2019: wusa /uninstall /kb:5009557 Windows Server 2016: wusa /uninstall /kb:5009546 Windows Server 2012: wusa /uninstall /kb:5009586 Windows Server 2022: wusa /uninstall /kb:5009555
NoteFor further updates and operational guidance on the issue, follow the instructions in the official Microsoft documentation. For more information, see RRAS Servers can lose connectivity if NAT is enabled on the public interface.
.NET Framework 3.5 fails to be installed in Windows Server 2012 R2
Problem description: If the Windows Server 2012 R2 operating system uses the images that are mentioned in this section, you cannot install .NET Framework 3.5 in the operating system, because one of the following patches is installed in the images: the KB5027141 patch released in June 2023, KB5028872 patch released in July 2023, KB5028970 patch released in August 2023, or KB5029915 patch released in September 2023.
ImportantIf you still want to use the Windows Server 2012 R2 operating system, we recommend that you create instances in the ECS console by using one of the following Windows Server 2012 R2 community images that have .NET Framework 3.5 installed: win2012r2_9600_x64_dtc_zh-cn_40G_.Net3.5_alibase_20231204.vhd and win2012r2_9600_x64_dtc_en-us_40G_.Net3.5_alibase_20231204.vhd. For information about how to search for an image that you want to use, see Find an image.
Solution:
On the control panel of your on-premises computer, find the KB5027141, KB5028872, KB5028970, or KB5029915 patch, right-click the patch, and then select Uninstall from the drop-down list to uninstall the patch. For example, uninstall the KB5029915 patch as shown in the following figure.
Restart the ECS instance.
For more information, see Restart an instance.
Install .NET Framework 3.5 by using one of the following methods.
Installation by using Server Manager
In the Server Manager window, click Add roles and features.
Follow the wizard default configuration, click Features in the left-side navigation pane, and then select .NET Framework 3.5 Features.
Follow the wizard to confirm the settings until the installation is complete.
Installation by running PowerShell commands
Run one of the following commands:
Dism /Online /Enable-Feature /FeatureName:NetFX3 /All
Install-WindowsFeature -Name NET-Framework-Features
Known issues of Linux images
CentOS
CentOS 8.0: The image version numbers of created instances change after the public image is updated
Problem description: After you connect to an instance created from the centos_8_0_x64_20G_alibase_20200218.vhd public image, you find that the operating system version of the instance is CentOS 8.1.
testuser@ecshost:~$ lsb_release -a LSB Version: :core-4.1-amd64:core-4.1-noarch Distributor ID: CentOS Description: CentOS Linux release 8.1.1911 (Core) Release: 8.1.1911 Codename: Core
Cause: The centos_8_0_x64_20G_alibase_20200218.vhd image is a public image that was updated by using the latest community update package. The version of CentOS in the image is upgraded to 8.1. Therefore, the actual operating system version is CentOS 8.1.
Affected image: centos_8_0_x64_20G_alibase_20200218.vhd.
Solution: You can call an API operation, such as the RunInstances operation, and set the ImageId parameter to
centos_8_0_x64_20G_alibase_20191225.vhd
to create an instance whose operating system version is CentOS 8.0.
CentOS 7: An issue may be caused by updates of specific image IDs
Problem description: The IDs of specific CentOS 7 public images were updated, which may affect the policies for obtaining image IDs during automated O&M.
Affected images: CentOS 7.5 and CentOS 7.6.
Cause: The image IDs used by the latest versions of CentOS 7.5 and CentOS 7.6 public images are in the following format:
%<OS type>%_%<Major version number>%_%<Minor version number >%_%<Special field>%_alibase_%<Date>%.%<Format>%
. For example, the image ID prefix of CentOS 7.5 public images is updated fromcentos_7_05_64
tocentos_7_5_x64
. In this case, you must modify the automated O&M policies that may be affected when the image IDs are updated. For information about image IDs, see Release notes for 2023.
CentOS 7: The hostname changes from uppercase letters to lowercase letters after an instance is restarted
Problem description: The first time some instances that run CentOS 7 are restarted, the hostnames of these instances change from uppercase letters to lowercase letters. The following table describes some examples.
Hostname
Hostname after the instance is restarted for the first time
The hostname remains in lowercase after the instance restarts
iZm5e1qe*****sxx1ps5zX
izm5e1qe*****sxx1ps5zx
Yes
ZZHost
zzhost
Yes
NetworkNode
networknode
Yes
The following CentOS public images and custom images derived from these public images are affected:
centos_7_2_64_40G_base_20170222.vhd
centos_7_3_64_40G_base_20170322.vhd
centos_7_03_64_40G_alibase_20170503.vhd
centos_7_03_64_40G_alibase_20170523.vhd
centos_7_03_64_40G_alibase_20170625.vhd
centos_7_03_64_40G_alibase_20170710.vhd
centos_7_02_64_20G_alibase_20170818.vhd
centos_7_03_64_20G_alibase_20170818.vhd
centos_7_04_64_20G_alibase_201701015.vhd
Affected hostnames: If the hostnames of your applications deployed on the instances are case-sensitive, services may be affected when you restart these instances. The following table describes whether the hostname changes after an instance is restarted.
Current state of hostname
The hostname changes after an instance is restarted
Time when the hostname changes
Continue to read this section
The hostname contains uppercase letters when you create the instance in the ECS console or by calling ECS API operations.
Yes
The first time the instance restarts.
Yes
The hostname contains only lowercase letters when you create the instance in the ECS console or by calling ECS API operations.
No
N/A
No
The hostname contains uppercase letters, and you modify the hostname after you log on to the instance.
No
N/A
Yes
Solution: To retain uppercase letters in the hostname of an instance after you restart the instance, perform the following operations:
Connect to an instance.
For more information, see Methods for connecting to an ECS instance.
View the existing hostname.
[testuser@izbp193*****3i161uynzzx ~]# hostname izbp193*****3i161uynzzx
Run the following command to make the hostname static:
hostnamectl set-hostname --static iZbp193*****3i161uynzzX
Run the following command to view the updated hostname:
[testuser@izbp193*****3i161uynzzx ~]# hostname iZbp193*****3i161uynzzX
What to do next: If you use an affected custom image, we recommend that you update cloud-init to the latest version and then create another custom image. To prevent this issue, you can use the new custom image to create instances. For more information, see Install cloud-init and Create a custom image from an instance.
CentOS 6.8: An instance on which the NFS client is installed does not respond
Problem description: A CentOS 6.8 instance on which the NFS client is installed does not respond and must be restarted.
Cause: When you use the NFS service on instances whose operating system kernel versions range from 2.6.32-696 to 2.6.32-696.10, the NFS client attempts to end a TCP connection if a glitch occurs due to communication latency. If the NFS server is slow in responding to NFS requests, the connection initiated by the NFS client may remain in the FIN_WAIT2 state for an extended period of time. In most cases, the connection times out and is closed 1 minute after the connection enters the FIN_WAIT2 state. Then, the NFS client can initiate a new connection. However, kernel versions 2.6.32-696 to 2.6.32-696.10 have issues with establishing TCP connections. As a result, the connection remains in the FIN_WAIT2 state, the NFS client is unable to recover the TCP connection, and a new TCP connection cannot be initiated. This causes the requests to freeze, and the only way to fix the issue is to restart the instance.
Affected images: centos_6_08_32_40G_alibase_20170710.vhd and centos_6_08_64_20G_alibase_20170824.vhd.
Solution: Run the yum update command to update the kernel to 2.6.32-696.11 or later.
ImportantBefore you perform operations on the instance, you must create a snapshot to back up your data. For more information, see Create a snapshot.
Debian
Debian 9.6: Instances in the classic network have network configuration issues
Problem description: Instances in the classic network that were created from Debian 9 public images cannot be pinged.
Cause: By default, the systemd-networkd service is disabled in Debian 9. Instances in the classic network that were created from Debian 9 public images cannot be automatically assigned IP addresses by using the Dynamic Host Configuration Protocol (DHCP).
Affected image: debian_9_06_64_20G_alibase_20181212.vhd.
Solution: Run the following commands in sequence:
systemctl enable systemd-networkd
systemctl start systemd-networkd
Fedora CoreOS
The hostnames of instances created from Fedora CoreOS custom images do not take effect
Problem description: After you use a Fedora CoreOS image to create Instance A, you create a Fedora CoreOS custom image from Instance A and use the custom image to create Instance B. The hostname of Instance B remains the same as that of Instance A and the hostname specified for Instance B does not take effect.
For example, you create a Fedora CoreOS custom image from
Instance A
that runs a Fedora CoreOS operating system and set the hostname of Instance A totest001
. Then, you createInstance B
from the custom image and set the hostname ofInstance B
totest002
. AfterInstance B
is created and connected, the hostname ofInstance B
remainstest001
.Cause: Fedora CoreOS public images provided by Alibaba Cloud use Ignition offered by Fedora CoreOS to initialize instance configurations. Ignition is a utility used by Fedora CoreOS and RHEL CoreOS to manage disks in the initramfs during startup. The first time a Fedora CoreOS instance starts,
coreos-ignition-firstboot-complete.service
in Ignition checks whether the /boot/ignition.firstboot file exists and determines whether to initialize instance configurations. If the /boot/ignition.firstboot file exists, the system initializes instance configurations (including the hostname configuration) and deletes the /boot/ignition.firstboot file.The Fedora CoreOS instance must have been started at least once before it is used to create a Fedora CoreOS custom image. The first time the instance starts, the system deletes the /boot/ignition.firstboot file from the image of the instance. Hence, the Fedora CoreOS custom image created from the instance does not contain the /boot/ignition.firstboot file. The first time instances created from the Fedora CoreOS custom image start, the system does not initialize the instance configurations. In this case, the hostnames of the instances remain unchanged.
Solution:
NoteTo ensure the security of data stored on the instance, we recommend that you create snapshots for the instance. If data exceptions occur on the instance, you can use snapshots to roll back the disks of the instance to the normal status. For more information, see Create a snapshot.
Before you use the Fedora CoreOS instance to create custom images, use the
root
permissions (the administrator permissions) to create the /ignition.firstboot file in the /boot directory. Perform the following operations:Run the following command to re-mount /boot in read/write mode:
sudo mount /boot -o rw,remount
Run the following command to create the /ignition.firstboot file:
sudo touch /boot/ignition.firstboot
Run the following command to re-mount /boot in read-only mode:
sudo mount /boot -o ro,remount
For information about how to configure Ignition, see Change /boot/ignition/config.ign permissions to 0600 and delete it after provisioning.
openSUSE
openSUSE 15: Kernel updates may cause the system to freeze during startup
Problem description: When openSUSE kernel versions are updated to
4.12.14-lp151.28.52-default
, instances that have specific CPU types may freeze during startup. The known CPU type isIntel®Xeon®CPU E5-2682 v4 @ 2.50GHz
. The following code describes the call trace debugging result:[ 0.901281] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.901281] CR2: ffffc90000d68000 CR3: 000000000200a001 CR4: 00000000003606e0 [ 0.901281] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 0.901281] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 0.901281] Call Trace: [ 0.901281] cpuidle_enter_state+0x6f/0x2e0 [ 0.901281] do_idle+0x183/0x1e0 [ 0.901281] cpu_startup_entry+0x5d/0x60 [ 0.901281] start_secondary+0x1b0/0x200 [ 0.901281] secondary_startup_64+0xa5/0xb0 [ 0.901281] Code: 6c 01 00 0f ae 38 0f ae f0 0f 1f 84 00 00 00 00 00 0f 1f 84 00 00 00 00 00 90 31 d2 65 48 8b 34 25 40 6c 01 00 48 89 d1 48 89 f0 <0f> 01 c8 0f 1f 84 00 00 00 00 00 0f 1f 84 00 00 00 00 00 ** **
Cause: The new kernel version is incompatible with the CPU microcode. For more information, see Issues of freezing during startup.
Affected image: opensuse_15_1_x64_20G_alibase_20200520.vhd.
Solution: In the /boot/grub2/grub.cfg file, add the
idle
kernel parameter to the row that starts withlinux
and set this parameter to nomwait. The following example shows how to modify the file:menuentry 'openSUSE Leap 15.1' --class opensuse --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-20f5f35a-fbab-4c9c-8532-bb6c66ce****' { load_video set gfxpayload=keep insmod gzio insmod part_msdos insmod ext2 set root='hd0,msdos1' if [ x$feature_platform_search_hint = xy ]; then search --no-floppy --fs-uuid --set=root --hint='hd0,msdos1' 20f5f35a-fbab-4c9c-8532-bb6c66ce**** else search --no-floppy --fs-uuid --set=root 20f5f35a-fbab-4c9c-8532-bb6c66ce**** fi echo 'Loading Linux 4.12.14-lp151.28.52-default ...' linux /boot/vmlinuz-4.12.14-lp151.28.52-default root=UUID=20f5f35a-fbab-4c9c-8532-bb6c66ce**** net.ifnames=0 console=tty0 console=ttyS0,115200n8 splash=silent mitigations=auto quiet idle=nomwait echo 'Loading initial ramdisk ...' initrd /boot/initrd-4.12.14-lp151.28.52-default }
Red Hat Enterprise Linux
Red Hat Enterprise Linux 8 64-bit: The kernel version cannot be updated by running the yum update command
Problem description: After you run the yum update command on an ECS instance that runs a RHEL 8 64-bit operating system to update its kernel version, the kernel version of the instance operating system remains unchanged even after the instance is restarted.
Cause: In the RHEL 8 64-bit operating system, the size of the /boot/grub2/grubenv file that stores GRUB2 environment variables is not 1,024 bytes. As a result, the kernel version cannot be updated.
Solution: After you update the kernel version, set the new kernel version to the default startup version. Perform the following operations:
Run the following command to update the kernel version:
yum update kernel -y
Run the following command to obtain the kernel startup parameter of the operating system:
grub2-editenv list | grep kernelopts
Run the following command to back up the old /grubenv file:
mv /boot/grub2/grubenv /home/grubenv.bak
Run the following command to create the /grubenv file:
grub2-editenv /boot/grub2/grubenv create
Run the following command to set the new kernel version to the default startup version.
In this example, the new kernel version is
/boot/vmlinuz-4.18.0-305.19.1.el8_4.x86_64
.grubby --set-default /boot/vmlinuz-4.18.0-305.19.1.el8_4.x86_64
Run the following command to set the kernel startup parameter.
In this example, run the
- set kernelopts
command to set the kernelopts value to the value of the kernel startup parameter obtained in Step ii.grub2-editenv - set kernelopts="root=UUID=0dd6268d-9bde-40e1-b010-0d3574b4**** ro crashkernel=auto net.ifnames=0 vga=792 console=tty0 console=ttyS0,115200n8 noibrs nosmt"
Run the following command to restart the instance for the new kernel version to take effect:
reboot
WarningThe restart operation stops the instance for a short period of time and may interrupt services that are running on the instance. We recommend that you restart the instance during off-peak hours.
SUSE Linux Enterprise Server
SUSE Linux Enterprise Server: The SMT server cannot be connected
Problem description: When you use a paid Alibaba Cloud image for SUSE Linux Enterprise Server or SUSE Linux Enterprise Server for SAP, connection errors such as a connection timeout may occur on the simultaneous multithreading (SMT) server. When you download or update a component of the SMT server, error messages similar to the following ones are returned:
Registration server returned 'This server could not verify that you are authorized to access this service.' (500)
Problem retrieving the repository index file for service 'SMT-http_mirrors_cloud_aliyuncs_com' location ****
Affected images: SUSE Linux Enterprise Server and SUSE Linux Enterprise Server for SAP.
Solution: Register and activate SMT again.
Run the following commands in sequence to register and activate SMT:
SUSEConnect -d SUSEConnect --cleanup systemctl restart guestregister
Run the following command to verify whether SMT is activated:
SUSEConnect -s
If SMT is activated, a command output similar to the following one is returned:
[{"identifier":"SLES_SAP","version":"12.5","arch":"x86_64","status":"Registered"}]
SLES 12 SP5: Kernel updates may cause the system to freeze during startup
Problem description: When an earlier kernel version is updated to SLES 12 SP5 or when you update the kernel of SLES 12 SP5, instances that have specific CPU types may freeze during startup. These known CPU types are
Intel®Xeon®CPU E5-2682 v4 @ 2.50GHz
andIntel®Xeon®CPU E7-8880 v4 @ 2.20GHz
. The following code describes the call trace debugging result:[ 0.901281] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.901281] CR2: ffffc90000d68000 CR3: 000000000200a001 CR4: 00000000003606e0 [ 0.901281] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 0.901281] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 0.901281] Call Trace: [ 0.901281] cpuidle_enter_state+0x6f/0x2e0 [ 0.901281] do_idle+0x183/0x1e0 [ 0.901281] cpu_startup_entry+0x5d/0x60 [ 0.901281] start_secondary+0x1b0/0x200 [ 0.901281] secondary_startup_64+0xa5/0xb0 [ 0.901281] Code: 6c 01 00 0f ae 38 0f ae f0 0f 1f 84 00 00 00 00 00 0f 1f 84 00 00 00 00 00 90 31 d2 65 48 8b 34 25 40 6c 01 00 48 89 d1 48 89 f0 <0f> 01 c8 0f 1f 84 00 00 00 00 00 0f 1f 84 00 00 00 00 00 ** **
Cause: The new kernel version is incompatible with the CPU microcode.
Solution: In the
/boot/grub2/grub.cfg
file, add theidle
kernel parameter to the row that starts withlinux
and set this parameter to nomwait. The following example shows how to modify the file:menuentry 'SLES 12-SP5' --class sles --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-fd7bda55-42d3-4fe9-a2b0-45efdced****' { load_video set gfxpayload=keep insmod gzio insmod part_msdos insmod ext2 set root='hd0,msdos1' if [ x$feature_platform_search_hint = xy ]; then search --no-floppy --fs-uuid --set=root --hint='hd0,msdos1' fd7bda55-42d3-4fe9-a2b0-45efdced**** else search --no-floppy --fs-uuid --set=root fd7bda55-42d3-4fe9-a2b0-45efdced**** fi echo 'Loading Linux 4.12.14-122.26-default ...' linux /boot/vmlinuz-4.12.14-122.26-default root=UUID=fd7bda55-42d3-4fe9-a2b0-45efdced**** net.ifnames=0 console=tty0 console=ttyS0,115200n8 mitigations=auto splash=silent quiet showopts idle=nomwait echo 'Loading initial ramdisk ...' initrd /boot/initrd-4.12.14-122.26-default }
Other issues
A call trace may occur when instances of specific instance types that run operating systems with more recent kernel versions are started
Problem description: If an instance of a specific instance type such as ecs.i2.4xlarge runs an operating system with a more recent kernel version, such as Red Hat Enterprise Linux (RHEL) 8.3 or CentOS 8.3 with the
4.18.0-240.1.1.el8_3.x86_64
kernel version, a call trace may occur when the instance is started. Call trace example:Dec 28 17:43:45 localhost SELinux: Initializing. Dec 28 17:43:45 localhost kernel: Dentry cache hash table entries: 8388608 (order: 14, 67108864 bytes) Dec 28 17:43:45 localhost kernel: Inode-cache hash table entries: 4194304 (order: 13, 33554432 bytes) Dec 28 17:43:45 localhost kernel: Mount-cache hash table entries: 131072 (order: 8, 1048576 bytes) Dec 28 17:43:45 localhost kernel: Mountpoint-cache hash table entries: 131072 (order: 8, 1048576 bytes) Dec 28 17:43:45 localhost kernel: unchecked MSR access error: WRMSR to 0x3a (tried to write 0x000000000000****) at rIP: 0xffffffff8f26**** (native_write_msr+0x4/0x20) Dec 28 17:43:45 localhost kernel: Call Trace: Dec 28 17:43:45 localhost kernel: init_ia32_feat_ctl+0x73/0x28b Dec 28 17:43:45 localhost kernel: init_intel+0xdf/0x400 Dec 28 17:43:45 localhost kernel: identify_cpu+0x1f1/0x510 Dec 28 17:43:45 localhost kernel: identify_boot_cpu+0xc/0x77 Dec 28 17:43:45 localhost kernel: check_bugs+0x28/0xa9a Dec 28 17:43:45 localhost kernel: ? __slab_alloc+0x29/0x30 Dec 28 17:43:45 localhost kernel: ? kmem_cache_alloc+0x1aa/0x1b0 Dec 28 17:43:45 localhost kernel: start_kernel+0x4fa/0x53e Dec 28 17:43:45 localhost kernel: secondary_startup_64+0xb7/0xc0 Dec 28 17:43:45 localhost kernel: Last level iTLB entries: 4KB 64, 2MB 8, 4MB 8 Dec 28 17:43:45 localhost kernel: Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0, 1GB 4 Dec 28 17:43:45 localhost kernel: FEATURE SPEC_CTRL Present Dec 28 17:43:45 localhost kernel: FEATURE IBPB_SUPPORT Present
Cause: The kernel version is updated by using the latest community update package to include the patches for writes to Model-Specific Registers (MSRs). However, some instance types such as ecs.i2.4xlarge do not support writes to MSRs due to the limits imposed by virtualization.
Solution: The call trace does not affect system operation or stability. You can ignore this issue.
Compatibility issues between specific Linux kernel versions and the hfg6 general-purpose instance family with high clock speeds may cause kernel panic
Problem description: When the kernels of some open source Linux distributions such as CentOS 8, SUSE Linux Enterprise Server (SLES) 15 SP2, and openSUSE 15.2 are updated to the latest versions in hfg6 instances, a kernel panic error may occur. The following figure shows an example of the call trace debugging method.
Cause: Some Linux kernel versions are incompatible with the hfg6 general-purpose instance family with high clock speeds.
Solution:
The compatibility issue is fixed in the latest kernel versions of SLES 15 SP2 and openSUSE 15.2. The following code shows the information of the change commit. If your latest kernel version contains this information, the kernel version is compatible with the hfg6 instance family.
commit 1e33d5975b49472e286bd7002ad0f689af33fab8 Author: Giovanni Gherdovich <ggherdovich@suse.cz> Date: Thu Sep 24 16:51:09 2020 +0200 x86, sched: Bail out of frequency invariance if turbo_freq/base_freq gives 0 (bsc#1176925). suse-commit: a66109f44265ff3f3278fb34646152bc2b3224a5 commit dafb858aa4c0e6b0ce6a7ebec5e206f4b3cfc11c Author: Giovanni Gherdovich <ggherdovich@suse.cz> Date: Thu Sep 24 16:16:50 2020 +0200 x86, sched: Bail out of frequency invariance if turbo frequency is unknown (bsc#1176925). suse-commit: 53cd83ab2b10e7a524cb5a287cd61f38ce06aab7 commit 22d60a7b159c7851c33c45ada126be8139d68b87 Author: Giovanni Gherdovich <ggherdovich@suse.cz> Date: Thu Sep 24 16:10:30 2020 +0200 x86, sched: check for counters overflow in frequency invariant accounting (bsc#1176925).
If you run the yum update command to update the kernel of CentOS 8 to
kernel-4.18.0-240
or later in hfg6 instances, a kernel panic error may occur. If this error occurs, roll the kernel back to the previous version.
Pip requests time out
Problem description: Pip requests occasionally time out or fail.
Affected images: CentOS, Debian, Ubuntu, SUSE, openSUSE, and Alibaba Cloud Linux.
Cause: Alibaba Cloud provides three pip repository addresses. The default address is mirrors.aliyun.com. To access this address, instances must be able to access the Internet. If your instance is not assigned a public IP address, pip requests time out.
Default public repository address: mirrors.aliyun.com
Internal repository address in virtual private clouds (VPCs): mirrors.cloud.aliyuncs.com
Internal repository address in the classic network: mirrors.aliyuncs.com
Solution: You use one of the following methods to resolve the issue:
Method 1
Associate an elastic IP address (EIP) with the instance. For more information, see Associate an EIP with an ECS instance.
You can also re-assign a public IP address to a subscription instance when you change the instance configurations. For more information, see Upgrade the instance types of subscription instances.
Method 2
If a pip request fails, you can run the fix_pypi.sh script in your instance and retry the pip operation. Perform the following steps:
Connect to an instance.
For more information, see Connect to an instance by using VNC.
Run the following command to obtain the script file:
wget http://image-offline.oss-cn-hangzhou.aliyuncs.com/fix/fix_pypi.sh
Run one of the following scripts based on the network type of the instance:
If your instance resides in a VPC, run the
bash fix_pypi.sh "mirrors.cloud.aliyuncs.com"
script.If your instance resides in the classic network, run the bash fix_pypi.sh "mirrors.aliyuncs.com" script.
Retry the pip operation.
The following sample code describes the fix_pypi.sh script:
#!/bin/bash function config_pip() { pypi_source=$1 if [[ ! -f ~/.pydistutils.cfg ]]; then cat > ~/.pydistutils.cfg << EOF [easy_install] index-url=http://$pypi_source/pypi/simple/ EOF else sed -i "s#index-url.*#index-url=http://$pypi_source/pypi/simple/#" ~/.pydistutils.cfg fi if [[ ! -f ~/.pip/pip.conf ]]; then mkdir -p ~/.pip cat > ~/.pip/pip.conf << EOF [global] index-url=http://$pypi_source/pypi/simple/ [install] trusted-host=$pypi_source EOF else sed -i "s#index-url.*#index-url=http://$pypi_source/pypi/simple/#" ~/.pip/pip.conf sed -i "s#trusted-host.*#trusted-host=$pypi_source#" ~/.pip/pip.conf fi } config_pip $1