HPE DL380 Gen10部署ESXi 8.0 U2后风扇转速过高问题

这周新入手DL380 Gen10一台,配置为,

  • 2 x Intel Xeon Silver 4110
  • 2 x 64G DDR4 LRDIMM 2133 MHz
  • 8 x 2.5″ SFF
  • 8 x 2.5″ NVMe
  • 2x 500W 80 Plus Platinum

插入两个240G HPE SSD,两个400G HPE NVMe。在将HPE定制版ESXi 8.0 U2安装到SD卡后,风扇转速一直处于30%以上。查看iLO中的System Information – Health Summary页面,发现AMS (Agentless Management Service) 状态为Not available,Power & Thermal – Temperature Information页面上也没有SSD和NVMe磁盘温度信息,其他硬件的温度均处于正常状态。猜测应该是iLO无法从ESXi接收到系统健康信息从而只能将风扇运行在较高转速状态以保证系统温度处于一定范围内。

在网上搜索一番后,根据HP社区的这篇帖子(https://community.hpe.com/t5/proliant-servers-ml-dl-sl/ams-not-available-in-ilo/td-p/7181613),猜测为AMS没有安装,或者没有启动所造成的问题。

SSH登录到ESXi主机,使用命令esxcli -s software component list|grep ‘amsd’,发现amsd的确没有安装,随即根据官方HPE Agentless Management Bundle for ESXi for HPE Gen10 and Gen10 Plus Servers下载安装amsd,重启ESXi。再次SSH登录到ESXi主机,使用命令/etc/init.d/amsd status,发现四个amsd服务处于not start状态,尝试手动启动未遂。检查/var/log/amsd.log,发现如下日志,

2024-06-19T16:41:18.472Z In(30) amsd[1050527]: smad: ERROR: Missing ilo driver.
2024-06-19T16:41:28.716Z In(30) amsd[1050705]: amsd: ERROR: Missing ilo driver.
2024-06-19T16:41:38.924Z In(30) amsd[1050957]: ahsd: ERROR: Missing ilo driver.
2024-06-19T16:41:49.180Z In(30) amsd[1051205]: smarev: ERROR: Missing ilo driver.

从官方HPE iLO Native Driver for ESXi 7.0下载最新版本iLO ESXi驱动(ESXi 8.0也适用)并安装,重启ESXi。SSH至ESXi主机,验证amsd服务处于运行状态。等待几分钟后,iLO页面显示AMS为OK状态,硬盘的温度也能正常显示,并且风扇的转速也降到11%。

部分命令,

# install amsd
[root@dl380-n1:/] esxcli software component apply -d /vmfs/volumes/66730501-6c4754a2-f480-08f1ea8cd12c/amsdComponent_701.11.10.0.4-1_23433471.zip
Installation Result
   Message: The update completed successfully, but the system needs to be rebooted for the changes to be effective.
   Components Installed: 701.11.10.0.4-1OEM.701.0.0.16850804
   Components Removed:
   Components Skipped:
   Reboot Required: true
   DPU Results:
# install iLO driver
[root@dl380-n1:/] esxcli software component apply -d /vmfs/volumes/66730501-6c4754a2-f480-08f1ea8cd12c/ilo-driver_700.10.8.2.2-1OEM.700.1.0.15843807_22942561.zip
Installation Result
   Message: The update completed successfully, but the system needs to be rebooted for the changes to be effective.
   Components Installed: ilo-driver_700.10.8.2.2-1OEM.700.1.0.15843807
   Components Removed:
   Components Skipped:
   Reboot Required: true
   DPU Results:
# verify
[root@dl380-n1:/var/log] esxcli software component get -n amsdComponent
amsdComponent_701.11.10.0.4-1
   Name: amsdComponent
   Display Name: Agentless Management Service Daemon for Gen10 and Gen10Plus
   Version: 701.11.10.0.4-1
   Display Version: 701.11.10.0.4-1
   VIBs: HPE_bootbank_amsd_701.11.10.0.4-1OEM.701.0.0.16850804
   Vendor: HPE
   Summary: amsdComponent: Agentless Management Service for Gen10 and Gen10Plus
   Severity: general
   Urgency: important
   Category: enhancement
   Release Type: extension
   Kburl: http://www.hpe.com
   Description: Agentless Management Service for Gen10 and Gen10Plus
   Contact: HPEVMwareSupport@groups.ext.hpe.com
   ReleaseDate: 03-06-2024
   Platforms: host

[root@dl380-n1:~] /etc/init.d/amsd status
amsd-smarev is running 1718816466 1
amsd-ahsd is running 1718816465 1
amsd-amsd is running 1718816464 1
amsd-smad is running 1718816463 1