Hi All
Hope someone out there can help.
I have 3 ESXi hosts running 6.7U3 (2 Production & 1 development) connected via scsi to a Lenovo DE4000H storage array. On the storage array we have 3 volumes seen by all 3 ESXi servers. Resently rebooted the development esxi host, when it came back online it could no longer see the datastores via the scsi connection.
I have checked the scsi adapter on the web interface to the esxi and it states it is on line. I have verified the iqn numbers between the esxi server and the storage array and all seems to be in order. I can even ping the IP address give to the san port on the san array but no matter what I try I am unable to see the datastores any more. This is only happening on the 1 server that was rebooted. The other 2 can see the volumes fine and we are able to browse the datestore if required.
In the log /var/log/vobd.log there are some errors 5488265684us: [esx.problem.storage.connectivity.lost] Lost connectivity to storage device naa.6d039ea00014999e0000017b5xxxxxxx. Path vmhba64:C1:T0:L3 is down. Affected datastores: "dev_disk".
I have followed a lot of suggestions found on the forums about removing the links and then adding them back again, then rescanning, but nothing seems to help. I have shtdown ESXi host, replugged scsi cables and powered up server again but running command esxcli-scsidevs -m only shows the local disk so no VM are able to be started as all the images are on the external scsi disks
Any suggestion would be great as I am banging my head against a wall to get this working again.
Hi Enrique
Thank you for all the assistance on this strange issue. It is finally resolved.
For any future visitors here are the high level steps taken.
1. IBM replaced the physical network SFP card. But still had the same issue and was not able to detect any of the datastores. Not sure this was necessary
2. I reloaded the ESXi with a newer version of 6.7U3 that I obtained from the Lenovo website, not VMware, as this is a specific ESXi version with all the extra Lenovo goodies included.
3. I was now able to add 1 static route and could see the datastores. However I was not able to get the second path to save. No matter how many times I added it, once clicking save it would disappear.
4. Added the second scsi path using the command line. #esxcli iscsi adapter discovery statictarget add -A vmhbaXX -a ip_address:port_number -n iqn_number_of_sortage_device
5. Rescanned for new devices from cli : #esxcli iscsi discovery rediscover -A vmhbaXX
This solved my issues. Shout out to Enrique for sticking with the problem through to the end.
Hello.
The physical servers are Lenovo ?
The connection between the servers and the Storage DE4000H is direct or through an Ethernet Switch or 2 Ethernet switches, the switches are Lenovo?
Did you update the Firmware of the servers and the Storage as part of the installation?
Were the physical servers and storage purchased together as part of a solution?
Hi Enrique
Hello.
We are going to use the DSA tool to obtain the hardware logs, I attach link
https://datacentersupport.lenovo.com/us/en/downloads/DS539437
Download the latest version of the DSA tool for Windows or Linux on a PC with access to the ESXi host.
enter in a CMD window with admin user of the PC in the directory where the DSA is and run it
lnvgy_utl_dsa_dsala7k-10.5_portable_windows_x86-64.exe --vmware-esxi root:yyyyyy@xxx.xxx.xxx.xxx
yyyyyy - password of root
xxx.xxx.xxx.xxx.xxx. IP address of ESXi host
A lenovo support directory will be created on the PC where the logs will be written. Please send this directory complete and packaged. It is relatively small less than 1MB.
Hello.
I did not get any information in the DSA.
Do you have access to the server's service device? which is used to remotely manage and monitor the server. At Lenovo it is called Xclarity Controller.
It is the ethernet port labeled XCC and is on the left side of the video connector on the back of the server.
If you have access you can capture the screenshots with the Frimware levels of the server and attach them in the post.
Attached is a link how to get the service data from the Lenovo server.
https://www.youtube.com/watch?v=wqDqQZS6eRM
Default
User: USERID
Password: PASSW0RD (0 is zero).
run the following commands to verify
esxcfg-vswitch -l
esxcfg-vmknic -l
esxcli iscsi adapter list
esxcli network nic list
Execute the following command for the vmnicX being used for the ISCSI connection to know the driver and Frimware of it
esxcli network nic get -n vmnicX
SE350 ( 7Z46 / 7D1X) | Supported | Most models** | Some models** |
ST50 (7Y48/7Y50 | Not supported | Not supported | Not supported |
ST250 (7Y45/7Y46) | Most models* | Upgrade | Upgrade |
SR150 (7Y54) | Most models* | Upgrade | Upgrade |
SR250 (7Y51/7Y52) | Most models* | Upgrade | Upgrade |
ST550 (7X09 / 7X10) | Most models* | Upgrade | Upgrade |
SR530 (7X07 / 7X08) | Most models* | Upgrade | Upgrade |
SR550 (7X03 / 7X04) | Most models* | Upgrade | Upgrade |
SR570 (7Y02 / 7Y03) | Most models* | Upgrade | Upgrade |
SR590 (7X98 / 7X99) | Most models* | Upgrade | Upgrade |
SR630 (7X01 / 7X02) | Most models* | Upgrade | Upgrade |
SR635 (7Y98 / 7Y99) | Not supported | Not supported | Not supported |
Hello.
The network configuration including ISCSI is normal, but there are some details like:
IPv6 if you are not using it is preferable to disable it.
For the ISCSI configuration the recommended MTU is 9000.
In the adapters being used for ISCSI vmnic4 and vmnic5 the driver (1.8.6) and the Firmware (7.0) is among those recommended by VMware in its compatibility matrix for version 6.7 Update 3.
https://www.vmware.com/resources/compatibility/detail.php?deviceCategory=io&productid=37976&vcl=true
The intel ethernet controller X710 cards that are included with some brand name servers (HP, Dell, and more) have presented a lot of problems.
In these cases you can use the native driver (VMware) instead of the manufacturer's driver (partner).
Another option that has been tested for versions 6.0 and 6.5 is to disable TSO and LRO.
The last option is to change the adapters.
I recommend you to try this second option, I attach details, it needs a reboot of the ESXi host for the changes to be applied.
To disable TSO:
Run this command to determine if the hardware TSO is enabled on the host:
esxcli system settings advanced list -o /Net/UseHwTSO
Run this command to disable TSO at the host level:
esxcli system settings advanced set -o /Net/UseHwTSO -i 0
(This command uses 0 (zero) to disable and 1 to enable.)
To disable LRO:
Run this command to determine if LRO is enabled for the VMkernel adapters on the host:
esxcli system settings advanced list -o /Net/TcpipDefLROEnabled
Run this command to disable LRO for all VMkernel adapters on a host:
esxcli system settings advanced set -o /Net/TcpipDefLROEnabled -i 0
(This command uses 0 (zero) to disable and 1 to enable)
You must find an offline time of the ESXi host to make the changes and reboot it. Then run a rescan of HBA and Storage and check if you have access to the ISCSI Storage.
If the problem continues, we could do a remote access (Free) for a general check of the ESXi and Storage, you must have access and users/password.
Hello.
About the Lenovo SR635, I was surprised by the fact that it does not have the XCC, which was standard on all Lenovo Servers, but now I see that it does not.
According to the product guide this server has the Lenovo XClarity Provisioning Manager lite.
attached is the user guide
https://sysmgt.lenovofiles.com/help/topic/LXPML/LXPM_Lite_user_guide.pdf
If you have offline time on the server would be good to enter the BIOS (UEFI) verify and capture the firmware level that has the server. Additionally know if the server has Lenovo XClarity Provisioning Manager or Lenovo XClarity Provisioning Manage lite.
Hi
Thank you for your time that you are spending on this problem
I have done what you have suggested, changed MTU size to 9000 on both storage devices vmnic 4 and 5 and disabled IPv6. I have disabled TSO and LRO by using the commands that you have supplied. I have rebooted the ESXi server and rescanned vmhba64 but unfortunately no joy. Still not seeing any of the datastores that I should see.
When I check the storage device Lenovo DE4000H scsi sessions, I see that there is no mention of the 3rd server. Could it be that I have a faulty 10GB network card within the server, even though I am able to ping the end point IP address (ControllerA port 0e and ControllerB port 0e)?
Hi Enrique
Thought I would just post the details here as well for a completion of the diagnostics.
Hi Again
I have followed the instruction in the url: https://tv.netapp.com/detail/video/6062700336001/vmware-configuration-guide-for-e-series-integration... however when I get to the time stamp 00:08:08 where I need to save the static target device, only the first one is saved. I have tried different browsers incase it was a browser caching issue, but get the same out come on both firefox and edge. Any idea why I am limited to one device. I see you added four links, but I only have two, one per controller.
If you would like to do another remote session, we can plan for tomorrow.
Hi Enrique
Thank you for all the assistance on this strange issue. It is finally resolved.
For any future visitors here are the high level steps taken.
1. IBM replaced the physical network SFP card. But still had the same issue and was not able to detect any of the datastores. Not sure this was necessary
2. I reloaded the ESXi with a newer version of 6.7U3 that I obtained from the Lenovo website, not VMware, as this is a specific ESXi version with all the extra Lenovo goodies included.
3. I was now able to add 1 static route and could see the datastores. However I was not able to get the second path to save. No matter how many times I added it, once clicking save it would disappear.
4. Added the second scsi path using the command line. #esxcli iscsi adapter discovery statictarget add -A vmhbaXX -a ip_address:port_number -n iqn_number_of_sortage_device
5. Rescanned for new devices from cli : #esxcli iscsi discovery rediscover -A vmhbaXX
This solved my issues. Shout out to Enrique for sticking with the problem through to the end.
command in step 5 should read: esxcli iscsi adapter discovery rediscover -A vmhbaXX (XX=correct number of your adapter)