Im running ESXi 6.5 U1 on Fujitsu PRIMERGY rx2530 m4 with the latest Fujitsu custom image for PRIMERGY servers. The problem is that, one or two times of day the vms and host are no responding and i need to restart the host locally when i connect to server with monitor and keyboard, because i not access of it. Is there someone with same problem or similar?
Are you able to ping the host? Have you tried looking at the vmkernel log while connected locally while host is not responding?
While host is not responding, i dont have ping to host and vms. At same time there are access to other devices and the problem is not from network equipment.
Im not tried looking at the vmkernel log while host is not responding, because when server is down, it stop the work of 200 employees and im in a hurry to restart the host.
Can you share the Network configuration of the ESXi host. the vSwitch details and physical network configuration
There are 6 physical NICs and 3 vswitches.
vswitch1 - management and first local network - vmnic0 - 172.10.30.0/24; vmk0
vswitch2 - second local network - vmnic1 192.168.10.0/24; vmk1
vswitch3 - nfs storage communication - vmnic2-vmnic5 - 10.10.10.0/24; vmk2
I think more info is required, that server is certified with 6.5 U1 so will have been through the testing process and everything should work. If you can't examine the vmkernel log at the time you can have a look at entries for the time the server failed once it is back up. If you are able to post them up here someone may be able to help further.
what is phyiscal switch configuration? are you using any teaming ?
when the host goes not responding are you loosing the ping to ESXI IP and VMs.?
share the nic details and driver and firmware used.
command
ethtool -i vmnic0
This are screenshots from vmkernel log while the host is not responding.
https://drive.google.com/open?id=0B5UgzPsgN8nPbHlkak94VGd0dkk
https://drive.google.com/open?id=0B5UgzPsgN8nPT1I0LWJJTF9Hd1U
https://drive.google.com/open?id=0B5UgzPsgN8nPLTFUdmo5MVktdTg
https://drive.google.com/open?id=0B5UgzPsgN8nPZFBoSkFoSXNIV0k
When the host is goes not responding i lose the ping to ESXI ip and vms.
There are 3 different hardware network devices for three networks without any vlans configuration.
Name PCI Driver Link Speed Duplex MAC Address MTU Description
vmnic0 0000:01:00.0 igbn Up 1000Mbps Full 90:1b:0e:d4:17:e4 1500 Intel Corporation I350 Gigabit Network Connection
vmnic1 0000:01:00.1 igbn Up 1000Mbps Full 90:1b:0e:d4:17:e5 1500 Intel Corporation I350 Gigabit Network Connection
vmnic10 0000:86:00.0 i40en Down 0Mbps Half 3c:fd:fe:a7:de:c8 1500 Intel(R) Ethernet Controller X710 for 10GbE SFP+
vmnic11 0000:86:00.1 i40en Down 0Mbps Half 3c:fd:fe:a7:de:c9 1500 Intel(R) Ethernet Controller X710 for 10GbE SFP+
vmnic12 0000:86:00.2 i40en Up 10000Mbps Full 3c:fd:fe:a7:de:ca 1500 Intel(R) Ethernet Controller X710 for 10GbE SFP+
vmnic13 0000:86:00.3 i40en Up 10000Mbps Full 3c:fd:fe:a7:de:cb 1500 Intel(R) Ethernet Controller X710 for 10GbE SFP+
vmnic2 0000:3d:00.0 i40en Up 1000Mbps Full 90:1b:0e:a9:b5:39 1500 Intel(R) Ethernet Connection X722 for 1GbE
vmnic3 0000:3d:00.1 i40en Down 0Mbps Half 90:1b:0e:a9:b5:3a 1500 Intel(R) Ethernet Connection X722 for 1GbE
vmnic4 0000:3d:00.2 i40en Down 0Mbps Half 90:1b:0e:a9:b5:3b 1500 Intel(R) Ethernet Connection X722 for 1GbE
vmnic5 0000:3d:00.3 i40en Down 0Mbps Half 90:1b:0e:a9:b5:3c 1500 Intel(R) Ethernet Connection X722 for 1GbE
vmnic6 0000:18:00.0 i40en Down 0Mbps Half 3c:fd:fe:a7:df:38 1500 Intel(R) Ethernet Controller X710 for 10GbE SFP+
vmnic7 0000:18:00.1 i40en Down 0Mbps Half 3c:fd:fe:a7:df:39 1500 Intel(R) Ethernet Controller X710 for 10GbE SFP+
vmnic8 0000:18:00.2 i40en Up 10000Mbps Full 3c:fd:fe:a7:df:3a 1500 Intel(R) Ethernet Controller X710 for 10GbE SFP+
vmnic9 0000:18:00.3 i40en Up 10000Mbps Full 3c:fd:fe:a7:df:3b 1500 Intel(R) Ethernet Controller X710 for 10GbE SFP+
2017-10-12T06:08:11.588Z cpu24:72179)WARNING: Term: 1498: Unknown ANSI attribute 4
2017-10-12T06:08:11.588Z cpu24:72179)WARNING: Term: 1498: Unknown ANSI attribute 24
2017-10-12T06:08:11.589Z cpu24:72179)WARNING: Term: 1498: Unknown ANSI attribute 4
2017-10-12T06:08:11.589Z cpu24:72179)WARNING: Term: 1498: Unknown ANSI attribute 24
This is the last part of vmkernel log before the host is not responding.
The problem is solved with ESXi 6.0 u3 installation!
Did you try install again ESXI 6.5U1 and solve the problem or no?
I have similar problem on Fujitsu server:
Just stop responding two nic. Using it only for management access. After restart management interface on VMware console. It’s back to normal. Three identical servers. Three identical problem. Not in the same time. Sometimes one server per week, somethimes one server per monts. Custom Fujitsu image. I think it’s something bug with Ethernet drivers.