Hi ,
i have an 2 * IBM X3650 M4 servers with ESXi 5.5 OS both of them are connected to an HP NAS through iSCSI
One of these servers crashes periodically and displays a pink screen with exception 14 ,
IBM told me the following :
______________________________________________________________
you must add the following lines to the file /etc/multipath.confles
****************************************
blacklist {
device {
vendor "IBM"
product "ServeRAID M5110e"
}
}
****************************************
________________________________________________________
but no success the problem persists . in the addition to the fact the the problem exists only on one server.
I have these lines in vmkernel log ,
2016-06-20T07:42:07.808Z cpu17:33481)VMK_PCI: 395: Device 0000:16:00.0 name: vmhba0
2016-06-20T07:42:07.808Z cpu17:33481)DMA: 612: DMA Engine 'vmhba0' created using mapper 'DMANull'.
2016-06-20T07:42:07.808Z cpu17:33481)ScsiScan: 976: Path 'vmhba0:C0:T0:L0': Vendor: 'IBM ' Model: 'ServeRAID M5110e' Rev: '3.19'
2016-06-20T07:42:07.808Z cpu17:33481)ScsiScan: 979: Path 'vmhba0:C0:T0:L0': Type: 0x0, ANSI rev: 5, TPGS: 0 (none)
2016-06-20T07:42:07.808Z cpu17:33481)megasas_slave_configure: do not export physical disk devices to upper layer.
2016-06-20T07:42:07.808Z cpu17:33481)WARNING: ScsiScan: 1408: Failed to add path vmhba0:C0:T0:L0 : Not found
2016-06-20T07:42:07.821Z cpu17:33481)ScsiScan: 976: Path 'vmhba0:C2:T0:L0': Vendor: 'IBM ' Model: 'ServeRAID M5110e' Rev: '3.19'
2016-06-20T07:42:07.821Z cpu17:33481)ScsiScan: 979: Path 'vmhba0:C2:T0:L0': Type: 0x0, ANSI rev: 5, TPGS: 0 (none)
2016-06-20T07:42:07.821Z cpu17:33481)ScsiScan: 1503: Add path: vmhba0:C2:T0:L0
2016-06-20T07:42:07.864Z cpu7:33421)<6>igb: vmnic3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
2016-06-20T07:42:07.888Z cpu18:33422)<6>igb: vmnic2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
2016-06-20T07:42:08.008Z cpu17:33481)PCI: driver megaraid_sas claimed device 0000:16:00.0
2016-06-20T07:42:08.008Z cpu17:33481)PCI: driver megaraid_sas claimed 1 device
2016-06-20T07:42:08.008Z cpu17:33481)ScsiNpiv: 1510: GetInfo for adapter vmhba0, [0x4108fa854bc0], max_vports=0, vports_inuse=0, linktype=0, state=0, failreason=0, sts=bad0020
2016-06-20T07:42:08.008Z cpu17:33481)Mod: 4780: Initialization of megaraid_sas succeeded with module ID 4146.
2016-06-20T07:42:08.008Z cpu17:33481)megaraid_sas loaded successfully.
2016-06-20T07:42:08.039Z cpu22:33482)Loading module ahci ...
2016-06-20T07:42:08.039Z cpu22:33482)Elf: 1861: module ahci has license GPL
2016-06-20T07:42:08.040Z cpu22:33482)module heap: Initial heap size: 1048576, max heap size: 9756672
2016-06-20T07:42:08.040Z cpu22:33482)vmklnx_module_mempool_init: Mempool max 9756672 being used for module: 4147
2016-06-20T07:42:08.040Z cpu22:33482)vmk_MemPoolCreate passed for 256 pages
2016-06-20T07:42:08.040Z cpu22:33482)module heap: using memType 2
2016-06-20T07:42:08.040Z cpu22:33482)module heap vmklnx_ahci: creation succeeded. id = 0x4109f87e1000
2016-06-20T07:42:08.040Z cpu22:33482)PCI: driver ahci is looking for devices
2016-06-20T07:42:08.040Z cpu22:33482)<7>ahci 0000:00:1f.2: version 3.0-17vmw
2016-06-20T07:42:08.040Z cpu22:33482)DMA: 612: DMA Engine 'vmklnxpci-0:0:31.2' created using mapper 'DMANull'.
2016-06-20T07:42:08.040Z cpu22:33482)DMA: 612: DMA Engine 'vmklnxpci-0:0:31.2' created using mapper 'DMANull'.
2016-06-20T07:42:08.040Z cpu22:33482)DMA: 612: DMA Engine 'vmklnxpci-0:0:31.2' created using mapper 'DMANull'.
2016-06-20T07:42:08.040Z cpu22:33482)DMA: 657: DMA Engine 'vmklnxpci-0:0:31.2' destroyed.
2016-06-20T07:42:08.047Z cpu16:32852)NetPort: 1589: disabled port 0x6
2016-06-20T07:42:08.047Z cpu16:32852)Uplink: 6530: enabled port 0x6 with mac 6e:ae:8b:3b:9d:19
2016-06-20T07:42:08.047Z cpu16:32852)NetPort: 1589: disabled port 0x3
2016-06-20T07:42:08.047Z cpu16:32852)Uplink: 6530: enabled port 0x3 with mac 6c:ae:8b:3b:9d:1b
2016-06-20T07:42:09.047Z cpu16:32852)NetPort: 1589: disabled port 0x5
2016-06-20T07:42:09.047Z cpu16:32852)Uplink: 6530: enabled port 0x5 with mac 6c:ae:8b:3b:9d:1d
2016-06-20T07:42:09.047Z cpu16:32852)NetPort: 1589: disabled port 0x4
2016-06-20T07:42:09.047Z cpu16:32852)Uplink: 6530: enabled port 0x4 with mac 6c:ae:8b:3b:9d:1c
2016-06-20T07:42:09.047Z cpu16:32852)NetPort: 1589: disabled port 0x2
2016-06-20T07:42:09.048Z cpu16:32852)Uplink: 6530: enabled port 0x2 with mac 6c:ae:8b:3b:9d:1a
2016-06-20T07:42:09.051Z cpu22:33482)<6>ahci 0000:00:1f.2: AHCI 0001.0300 32 slots 6 ports 1.5 Gbps 0x2 impl SATA mode
2016-06-20T07:42:09.051Z cpu22:33482)<6>ahci 0000:00:1f.2: flags: 64bit ncq sntf led clo pio slum part
2016-06-20T07:42:09.051Z cpu22:33482)IRQ: 540: 0x39 <ahci> sharable, flags 0x10
2016-06-20T07:42:09.051Z cpu22:33482)VMK_VECTOR: 218: Registered handler for shared interrupt 0xff39, flags 0x10
2016-06-20T07:42:09.053Z cpu22:33482)LinPCI: LinuxPCI_DeviceIsPAECapable:602: PAE capable device at 0000:00:1f.2
2016-06-20T07:42:09.053Z cpu22:33482)VMK_PCI: 395: Device 0000:00:1f.2 name: vmhba1
2016-06-20T07:42:09.053Z cpu22:33482)DMA: 612: DMA Engine 'vmhba1' created using mapper 'DMANull'.
2016-06-20T07:42:09.054Z cpu22:33482)LinPCI: LinuxPCI_DeviceIsPAECapable:602: PAE capable device at 0000:00:1f.2
2016-06-20T07:42:09.054Z cpu22:33482)VMK_PCI: 395: Device 0000:00:1f.2 name: vmhba1
2016-06-20T07:42:09.054Z cpu22:33482)DMA: 612: DMA Engine 'vmhba1' created using mapper 'DMANull'.
2016-06-20T07:42:09.054Z cpu22:33482)DMA: 657: DMA Engine 'vmhba1' destroyed.
2016-06-20T07:42:09.054Z cpu22:33482)DMA: 612: DMA Engine 'vmhba32' created using mapper 'DMANull'.
Do these lines mean that i have a problem in Raid controller??
Best Regards,
Have you already applied the latest firmware version to your server ? And another option is try to upgrade the MegaRAID SAS driver, like described here: ESXi 5.x host fails with a purple diagnostic screen when using LSI MegaRAID SAS Driver (2052368) | V...
The firmware is up-to-date but the firmaware i'm not sure i will check and try and come back to you , thank you very much
hi ,
i've upgraded the mega sas driver ,but i've received a pink screen again
sorry for the delay of the feedback , client is not always available , the problem that this client has two servers with the same characteristics , but the second one, has no problem .
Best Regards ,
I would also patch the ESXI host to the latest version or latest patch level for 5.5.
Generally these exception 14's are hardware related so it would also be worth running a hardware diagnostics via tools available from the vendor.
I hope this helps.
Thank you for response, actually i looks like a hardware problem , fortunately the server still under warranty so we've contacted IBM , and it look like a RAID card problem , but IBM still didn't confess they still upgrading firmwars and stuff and the problem continues until know ,
Thanks again for your help.
Best Regards ,