VMware Cloud Community
ericsaito
Contributor
Contributor

ESXi 5.5.0 SCSI Tape Drive/Autoloader crashed the system

Dear all,

I need help.

Problem:

ESXi 5.5.0 Centos VM freezes(becomes unresponsive) or ESXi host crashes (with core dump) error

Task:

I need to build a backup system using PowerVault 124T (SCSI interface) and Linux application running under Centos on EXSi 5.5 platform.

What I have

ESXi Physical host -  PowerEdge R720 rack server.

EXSi 5.5.0 Build 1331820 installed on PowerEdge R720 Server

CentOS release 6.8 (Final), Linux  2.6.32-642.4.2.el6.x86_64  installed and configured on EXSi 5.5.0

PowerVault 124T (SCSI interface) connected to PowerEdge R720  SCSI interface


What has been done.

1. Connected PowerVault 124T via SCSI interface to EXSi 5.5.0 Build 1331820

2. Installed and configured CentOS release 6.8 (Final), Linux  2.6.32-642.4.2.el6.x86_64

3. Configured SCSI  I/O path through through ESXi 5.5.0 to Centos vm

- Connected  PowerVault 124T and PowerEdge R720 using SCSI cable

- Under V Sphere client  located a new device (see below)

1.JPG

- under vm created 2 SCSI devices and assigned tape drive and tape changer to them.


Tape Drive

drive.JPG

Media Changer

changer.JPG


Both devices use the same ID but different channels.


Maybe it could be the reason?

Note: ESXi/ESX supports vendor-supported parallel SCSI tape drives and tape libraries only. If you are using a tape drive library (versus using a stand-alone tape drive), the library must be multi-target, and not multi-LUN. Ensure that the tape drive's virtual target ID in the virtual machine's configuration is set to be the same as the physical target ID. The tape hardware must be SCSI ID addressed and cannot be LUN addressed. https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=10164...https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=10164...


In my case Dell Powervault 124T tape library has the same SCSI ID. I use the same SCSI id and different channels (see pics above)

Could it be the reason or not?


- I located both devices under Centos VM

[1:0:0:0]    cd/dvd  NECVMWar VMware IDE CDR10 1.00  /dev/sr0

[2:0:0:0]    disk    VMware  Virtual disk    1.0  /dev/sda

[2:0:1:0]    disk    VMware  Virtual disk    1.0  /dev/sdb

[2:0:2:0]    disk    VMware  Virtual disk    1.0  /dev/sdc

[3:0:6:0]    tape    IBM      ULTRIUM-TD3      6B20  /dev/st0

[4:0:6:0]    mediumx DELL    PV-124T          0053  /dev/sch0

[1:0:0:0]    cd/dvd  NECVMWar VMware IDE CDR10 1.00  /dev/sr0  /dev/sg0

[2:0:0:0]    disk    VMware  Virtual disk    1.0  /dev/sda  /dev/sg1

[2:0:1:0]    disk    VMware  Virtual disk    1.0  /dev/sdb  /dev/sg2

[2:0:2:0]    disk    VMware  Virtual disk    1.0  /dev/sdc  /dev/sg3

[3:0:6:0]    tape    IBM      ULTRIUM-TD3      6B20  /dev/st0  /dev/sg4

[4:0:6:0]    mediumx DELL    PV-124T          0053  /dev/sch0  /dev/sg5

!!!! What I mentioned here is SCSI adapter channel number is different for both Tape drive and Media Changer under Centos VM and ESXI itself. I'm not sure if it is critical!!!


I use tar command for backup operation.

Issues:

- ESXi 5.5.0 host froze with (core dump error) - Hard reboot helped

-Media changer becomes unresponsive from time to time.

- vm froze once during backup process as it was able to locate SCSI media changer - Hard reboot helped

.

QUESTION:

What could be the reason of unstable operation of SCSI device under ESXi 5.5.0 ?

Please, see logs below:

ESXi 5.5.0 logs


/var/log/hostd.log

2016-10-10T22:15:47.406Z [30440B70 error 'SoapAdapter.HTTPService.HttpConnection'] Failed to read header on stream <io_obj p:0x2eb9e780, h:66, <TCP '0.0.0.0:0'>, <TCP '0.0.0.0:0'>>: N7Vmacore15Syst

2016-10-10T22:15:47.433Z [30440B70 verbose 'Default' opID=14179AAA-0000095C user=root] AdapterServer: target='vmodl.query.PropertyCollector:ha-property-collector', method='waitForUpdates'

2016-10-10T22:16:00.045Z [312C1B70 verbose 'Statssvc.vim.PerformanceManager'] HostCtl Exception in stats collection: Sysinfo error on operation returned status : Not initialized. Please see the VMk

2016-10-10T22:16:00.046Z [312C1B70 verbose 'Statssvc.vim.PerformanceManager'] HostCtl Exception in stats collection.  Turn on 'trivia' log for details

2016-10-10T22:16:19.280Z [309C8B70 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root

2016-10-10T22:16:20.060Z [312C1B70 verbose 'Statssvc.vim.PerformanceManager'] HostCtl Exception in stats collection: Sysinfo error on operation returned status : Not initialized. Please see the VMk

2016-10-10T22:16:20.060Z [312C1B70 verbose 'Statssvc.vim.PerformanceManager'] HostCtl Exception in stats collection.  Turn on 'trivia' log for details

2016-10-10T22:16:40.060Z [304C3B70 verbose 'Statssvc.vim.PerformanceManager'] HostCtl Exception in stats collection: Sysinfo error on operation returned status : Not initialized. Please see the VMk

2016-10-10T22:16:40.060Z [304C3B70 verbose 'Statssvc.vim.PerformanceManager'] HostCtl Exception in stats collection.  Turn on 'trivia' log for details

Section for VMware ESX, pid=33896, version=5.5.0, build=1331820, option=Release

------ In-memory logs start --------

mem> 2016-10-11T02:00:41.975Z [FFC885B0 verbose 'Default'] No update forwarding configured

mem> 2016-10-11T02:00:41.976Z [FFC885B0 info 'Default'] Supported VMs 256

mem> 2016-10-11T02:00:41.976Z [FFC885B0 info 'Handle checker'] Setting system limit of 2560

mem> 2016-10-11T02:00:41.976Z [FFC885B0 info 'Handle checker'] Set system limit to 2560

mem> 2016-10-11T02:00:41.976Z [FFC885B0 info 'Default'] Setting malloc mmap threshold to 32 k

mem> 2016-10-11T02:00:41.976Z [FFC885B0 info 'Default'] getrlimit(RLIMIT_NPROC): curr=64 max=128, return code = Success

mem> 2016-10-11T02:00:41.976Z [FFC885B0 info 'Default'] setrlimit(RLIMIT_NPROC): curr=128 max=128, return code = Success

------ In-memory logs end  --------

2016-10-11T02:00:41.976Z [FFC885B0 info 'Default'] Logging uses fast path: false

2016-10-11T02:00:41.976Z [FFC885B0 info 'Default'] Handling bora/lib logs with VmaCore facilities


/var/log/vobd.log


2016-10-07T16:03:45.495Z: [scsiCorrelator] 34808415080360us: [esx.clear.scsi.device.io.latency.improved] Device naa.6c81f660e97c93001af69254192c0dcb performance has improved. I/O latency reduced fr

2016-10-08T01:35:02.110Z: [scsiCorrelator] 34842003385001us: [vob.scsi.scsipath.pathstate.dead] scsiPath vmhba1:C0:T6:L0 changed state from on

2016-10-08T01:35:02.110Z: [APDCorrelator] 34842003385058us: [vob.storage.apd.start] Device or filesystem with identifier [t10.IBM_____ULTRIUM2DTD3_____1210354899] has entered the All Paths Down sta

2016-10-08T01:35:02.110Z: [APDCorrelator] 34842691695354us: [esx.problem.storage.apd.start] Device or filesystem with identifier [t10.IBM_____ULTRIUM2DTD3_____1210354899] has entered the All Paths

2016-10-08T01:35:02.110Z: [scsiCorrelator] 34842691695819us: [esx.problem.storage.connectivity.lost] Lost connectivity to storage device t10.IBM_____ULTRIUM2DTD3_____1210354899. Path vmhba1:C0:T6:L

2016-10-08T01:35:02.367Z: [scsiCorrelator] 34842003642412us: [vob.scsi.scsipath.pathstate.dead] scsiPath vmhba1:C0:T6:L1 changed state from on

2016-10-08T01:35:02.367Z: [APDCorrelator] 34842003642467us: [vob.storage.apd.start] Device or filesystem with identifier [naa.500e09efff0df959] has entered the All Paths Down state.

2016-10-08T01:35:02.367Z: [APDCorrelator] 34842691952647us: [esx.problem.storage.apd.start] Device or filesystem with identifier [naa.500e09efff0df959] has entered the All Paths Down state.

2016-10-08T01:35:02.368Z: [scsiCorrelator] 34842691953017us: [esx.problem.storage.connectivity.lost] Lost connectivity to storage device naa.500e09efff0df959. Path vmhba1:C0:T6:L1 is down. Affected

2016-10-08T01:37:22.114Z: [APDCorrelator] 34842143387115us: [vob.storage.apd.timeout] Device or filesystem with identifier [t10.IBM_____ULTRIUM2DTD3_____1210354899] has entered the All Paths Down T

2016-10-08T01:37:22.114Z: [APDCorrelator] 34842831699773us: [esx.problem.storage.apd.timeout] Device or filesystem with identifier [t10.IBM_____ULTRIUM2DTD3_____1210354899] has entered the All Path

2016-10-08T01:37:22.372Z: [APDCorrelator] 34842143644489us: [vob.storage.apd.timeout] Device or filesystem with identifier [naa.500e09efff0df959] has entered the All Paths Down Timeout state after

2016-10-08T01:37:22.372Z: [APDCorrelator] 34842831957149us: [esx.problem.storage.apd.timeout] Device or filesystem with identifier [naa.500e09efff0df959] has entered the All Paths Down Timeout stat

2016-10-08T01:40:01.865Z: [scsiCorrelator] 34842303134518us: [vob.scsi.scsipath.por] Power-on Reset occurred on vmhba1:C0:T6:L0

2016-10-08T01:40:01.865Z: [scsiCorrelator] 34842303135351us: [vob.scsi.scsipath.por] Power-on Reset occurred on vmhba1:C0:T6:L1

2016-10-08T01:40:01.879Z: [scsiCorrelator] 34842303148915us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba1:C0:T6:L0 changed state from dead

2016-10-08T01:40:01.879Z: [APDCorrelator] 34842303148950us: [vob.storage.apd.exit] Device or filesystem with identifier [t10.IBM_____ULTRIUM2DTD3_____1210354899] has exited the All Paths Down state

2016-10-08T01:40:01.879Z: [APDCorrelator] 34842991464975us: [esx.clear.storage.apd.exit] Device or filesystem with identifier [t10.IBM_____ULTRIUM2DTD3_____1210354899] has exited the All Paths Down

2016-10-08T01:40:01.880Z: [scsiCorrelator] 34842991465527us: [esx.clear.storage.connectivity.restored] Connectivity to storage device t10.IBM_____ULTRIUM2DTD3_____1210354899 (Datastores: Unknown) r

2016-10-08T01:40:01.881Z: [scsiCorrelator] 34842303151048us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba1:C0:T6:L1 changed state from dead

2016-10-08T01:40:01.881Z: [APDCorrelator] 34842303151083us: [vob.storage.apd.exit] Device or filesystem with identifier [naa.500e09efff0df959] has exited the All Paths Down state.

2016-10-08T01:40:01.881Z: [APDCorrelator] 34842991466938us: [esx.clear.storage.apd.exit] Device or filesystem with identifier [naa.500e09efff0df959] has exited the All Paths Down state.

2016-10-08T01:40:01.882Z: [scsiCorrelator] 34842991467150us: [esx.clear.storage.connectivity.restored] Connectivity to storage device naa.500e09efff0df959 (Datastores: Unknown) restored. Path vmhba

2016-10-08T11:07:53.380Z: [scsiCorrelator] 34876373972883us: [vob.scsi.device.io.latency.high] Device naa.6c81f660e97c93001af69254192c0dcb performance has deteriorated. I/O latency increased from a

2016-10-08T11:07:53.380Z: [scsiCorrelator] 34877062965764us: [e


2016-10-08T01:40:01.881Z: [scsiCorrelator] 34842303151048us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba1:C0:T6:L1 changed state from dead

2016-10-08T01:40:01.881Z: [APDCorrelator] 34842303151083us: [vob.storage.apd.exit] Device or filesystem with identifier [naa.500e09efff0df959] has exited the All Paths Down state.

2016-10-08T01:40:01.881Z: [APDCorrelator] 34842991466938us: [esx.clear.storage.apd.exit] Device or filesystem with identifier [naa.500e09efff0df959] has exited the All Paths Down state.

2016-10-08T01:40:01.882Z: [scsiCorrelator] 34842991467150us: [esx.clear.storage.connectivity.restored] Connectivity to storage device naa.500e09efff0df959 (Datastores: Unknown) restored. Path vmhba

2016-10-08T11:07:53.380Z: [scsiCorrelator] 34876373972883us: [vob.scsi.device.io.latency.high] Device naa.6c81f660e97c93001af69254192c0dcb performance has deteriorated. I/O latency increased from a

2016-10-08T11:07:53.380Z: [scsiCorrelator] 34877062965764us: [esx.problem.scsi.device.io.latency.high] Device naa.6c81f660e97c93001af69254192c0dcb performance has deteriorated. I/O latency increase

2016-10-08T11:08:03.133Z: [scsiCorrelator] 34876383725580us: [vob.scsi.device.io.latency.improved] Device naa.6c81f660e97c93001af69254192c0dcb performance has improved. I/O latency reduced from 972

2016-10-08T11:08:03.133Z: [scsiCorrelator] 34877072718506us: [esx.clear.scsi.device.io.latency.improved] Device naa.6c81f660e97c93001af69254192c0dcb performance has improved. I/O latency reduced fr

2016-10-08T11:08:18.285Z: [scsiCorrelator] 34876398877310us: [vob.scsi.device.io.latency.improved] Device naa.6c81f660e97c93001af69254192c0dcb performance has improved. I/O latency reduced from 192

2016-10-08T11:08:18.285Z: [scsiCorrelator] 34877087870730us: [esx.clear.scsi.device.io.latency.improved] Device naa.6c81f660e97c93001af69254192c0dcb performance has improved. I/O latency reduced fr

2016-10-08T11:09:32.987Z: [scsiCorrelator] 34876473577292us: [vob.scsi.device.io.latency.high] Device naa.6c81f660e97c93001af69254192c0dcb performance has deteriorated. I/O latency increased from a

2016-10-08T11:09:32.987Z: [scsiCorrelator] 34877162572240us: [esx.problem.scsi.device.io.latency.high] Device naa.6c81f660e97c93001af69254192c0dcb performance has deteriorated. I/O latency increase

2016-10-08T11:09:35.394Z: [scsiCorrela


2016-10-10T10:15:06.174Z: [scsiCorrelator] 35046003403819us: [vob.scsi.scsipath.pathstate.dead] scsiPath vmhba1:C0:T6:L0 changed state from on

2016-10-10T10:15:06.174Z: [APDCorrelator] 35046003403873us: [vob.storage.apd.start] Device or filesystem with identifier [t10.IBM_____ULTRIUM2DTD3_____1210354899] has entered the All Paths Down sta

2016-10-10T10:15:06.174Z: [APDCorrelator] 35046695759512us: [esx.problem.storage.apd.start] Device or filesystem with identifier [t10.IBM_____ULTRIUM2DTD3_____1210354899] has entered the All Paths

2016-10-10T10:15:06.175Z: [scsiCorrelator] 35046695759982us: [esx.problem.storage.connectivity.lost] Lost connectivity to storage device t10.IBM_____ULTRIUM2DTD3_____1210354899. Path vmhba1:C0:T6:L

2016-10-10T10:15:06.431Z: [scsiCorrelator] 35046003660761us: [vob.scsi.scsipath.pathstate.dead] scsiPath vmhba1:C0:T6:L1 changed state from on

2016-10-10T10:15:06.431Z: [APDCorrelator] 35046003660818us: [vob.storage.apd.start] Device or filesystem with identifier [naa.500e09efff0df959] has entered the All Paths Down state.

2016-10-10T10:15:06.431Z: [APDCorrelator] 35046696016388us: [esx.problem.storage.apd.start] Device or filesystem with identifier [naa.500e09efff0df959] has entered the All Paths Down state.

2016-10-10T10:15:06.431Z: [scsiCorrelator] 35046696016778us: [esx.problem.storage.connectivity.lost] Lost connectivity to storage device naa.500e09efff0df959. Path vmhba1:C0:T6:L1 is down. Affected

2016-10-10T10:15:10.924Z: [scsiCorrelator] 35046008154422us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba1:C0:T6:L1 changed state from dead

2016-10-10T10:15:10.924Z: [APDCorrelator] 35046008154453us: [vob.storage.apd.exit] Device or filesystem with identifier [naa.500e09efff0df959] has exited the All Paths Down state.

2016-10-10T10:15:10.924Z: [APDCorrelator] 35046700509953us: [esx.clear.storage.apd.exit] Device or filesystem with identifier [naa.500e09efff0df959] has exited the All Paths Down state.

2016-10-10T10:15:10.925Z: [scsiCorrelator] 35046700510152us: [esx.clear.storage.connectivity.restored] Connectivity to storage device naa.500e09efff0df959 (Datastores: Unknown) restored. Path vmhba

2016-10-10T10:17:26.177Z: [APDCorrelator] 35046143404009us: [vob.storage.apd.timeout] Device or filesystem with identifier [t10.IBM_____ULTRIUM2DTD3_____1210354899] has entered the All Paths Down T

2016-10-10T10:17:26.177Z: [APDCorrelator] 35046835762367us: [esx.problem.storage.apd.timeout] Device or filesystem with identifier [t10.IBM_____ULTRIUM2DTD3_____1210354899] has entered the All Path

2016-10-10T10:20:05.929Z: [scsiCorrelator] 35046303153759us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba1:C0:T6:L0 changed state from dead

2016-10-10T10:20:05.930Z: [APDCorrelator] 35046303153784us: [vob.storage.apd.exit] Device or filesystem with identifier [t10.IBM_____ULTRIUM2DTD3_____1210354899] has exited the All Paths Down state

2016-10-10T10:20:05.930Z: [APDCorrelator] 35046995515102us: [esx.clear.storage.apd.exit] Device or filesystem with identifier [t10.IBM_____ULTRIUM2DTD3_____1210354899] has exited the All Paths Down

2016-10-10T10:20:05.930Z: [scsiCorrelator] 35046995515305us: [esx.clear.storage.connectivity.restored] Connectivity to storage device t10.IBM_____ULTRIUM2DTD3_____1210354899 (Datastores: Unknown) r

2016-10-10T13:38:44.421Z: [APDCorrelator] 35058221406489us: [vob.storage.apd.start] Device or filesystem with identifier [t10.IBM_____ULTRIUM2DTD3_____1210354899] has entered the All Paths Down sta

2016-10-10T13:38:44.421Z: [scsiCorrelator] 35058221406452us: [vob.scsi.scsipath.pathstate.dead] scsiPath vmhba1:C0:T6:L0 changed state from on

2016-10-10T13:38:44.421Z: [APDCorrelator] 35058914006161us: [esx.problem.storage.apd.start] Device or filesystem with identifier [t10.IBM_____ULTRIUM2DTD3_____1210354899] has entered the All Paths

2016-10-10T13:38:44.421Z: [scsiCorrelator] 35058914006467us: [esx.problem.storage.connectivity.lost] Lost connectivity to storage device t10.IBM_____ULTRIUM2DTD3_____1210354899. Path vmhba1:C0:T6:L

2016-10-10T13:38:45.677Z: [scsiCorrelator] 35058222663340us: [vob.scsi.scsipath.pathstate.dead] scsiPath vmhba1:C0:T6:L1 changed state from on

2016-10-10T13:38:45.677Z: [APDCorrelator] 35058222663375us: [vob.storage.apd.start] Device or filesystem with identifier [naa.500e09efff0df959] has entered the All Paths Down state.

2016-10-10T13:38:45.678Z: [APDCorrelator] 35058915263037us: [esx.problem.storage.apd.start] Device or filesystem with identifier [naa.500e09efff0df959] has entered the All Paths Down state.

2016-10-10T13:38:45.678Z: [scsiCorrelator] 35058915263225us: [esx.problem.storage.connectivity.lost] Lost connectivity to storage device naa.500e09efff0df959. Path vmhba1:C0:T6:L1 is down. Affected

2016-10-10T13:38:49.438Z: [scsiCorrelator] 35058226424298us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba1:C0:T6:L1 changed state from dead

2016-10-10T13:38:49.438Z: [APDCorrelator] 35058226424328us: [vob.storage.apd.exit] Device or filesystem with identifier [naa.500e09efff0df959] has exited the All Paths Down state.

2016-10-10T13:38:49.439Z: [APDCorrelator] 35058919024038us: [esx.clear.storage.apd.exit] Device or filesystem with identifier [naa.500e09efff0df959] has exited the All Paths Down state.

2016-10-10T13:38:49.439Z: [scsiCorrelator] 35058919024218us: [esx.clear.storage.connectivity.restored] Connectivity to storage device naa.500e09efff0df959 (Datastores: Unknown) restored. Path vmhba

2016-10-10T13:38:50.172Z: [APDCorrelator] 35058227157695us: [vob.storage.apd.exit] Device or filesystem with identifier [t10.IBM_____ULTRIUM2DTD3_____1210354899] has exited the All Paths Down state

2016-10-10T13:38:50.172Z: [APDCorrelator] 35058919757461us: [esx.clear.storage.apd.exit] Device or filesystem with identifier [t10.IBM_____ULTRIUM2DTD3_____1210354899] has exited the All Paths Down

2016-10-10T13:38:50.172Z: [scsiCorrelator] 35058227157664us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba1:C0:T6:L0 changed state from dead

2016-10-10T13:38:50.172Z: [scsiCorrelator] 35058919757828us: [esx.clear.storage.connectivity.restored] Connectivity to storage device t10.IBM_____ULTRIUM2DTD3_____1210354899 (Datastores: Unknown) r

2016-10-11T02:00:03.854Z: [netCorrelator] 21573056us: [vob.net.vmnic.linkstate.up] vmnic vmnic2 linkstate up


/var/log/vmkernel.log

<6>0 :: Attempting to queue an ABORT message:CDB: 0x0 0x0 0x0 0x0 0x0 0x0

2016-10-10T22:15:38.351Z cpu13:92288823)scsi0: Command abort returning 0x2003

2016-10-10T22:15:38.351Z cpu13:92288823)WARNING: LinScsi: SCSILinuxAbortCommands:1837: Failed, Driver Adaptec AIC7902 Ultra320 SCSI adapter, for vmhba1

2016-10-10T22:15:38.351Z cpu13:92288823)WARNING: ScsiPath: 6273: Set retry timeout for failed TaskMgmt abort for CmdSN  0x0, status Failure, path vmhba1:C0:T6:L0

<6>0 :: Attempting to queue an ABORT message:CDB: 0x0 0x0 0x0 0x0 0x0 0x0

2016-10-10T22:15:40.352Z cpu14:92288823)scsi0: Command abort returning 0x2003

2016-10-10T22:15:40.352Z cpu14:92288823)WARNING: LinScsi: SCSILinuxAbortCommands:1837: Failed, Driver Adaptec AIC7902 Ultra320 SCSI adapter, for vmhba1

2016-10-10T22:15:40.352Z cpu14:92288823)WARNING: ScsiPath: 6273: Set retry timeout for failed TaskMgmt abort for CmdSN  0x0, status Failure, path vmhba1:C0:T6:L0

<6>0 :: Attempting to queue an ABORT message:CDB: 0x0 0x0 0x0 0x0 0x0 0x0

2016-10-10T22:15:42.352Z cpu14:92288823)scsi0: Command abort returning 0x2003

2016-10-10T22:15:42.352Z cpu14:92288823)WARNING: LinScsi: SCSILinuxAbortCommands:1837: Failed, Driver Adaptec AIC7902 Ultra320 SCSI adapter, for vmhba1

2016-10-10T22:15:42.352Z cpu14:92288823)WARNING: ScsiPath: 6273: Set retry timeout for failed TaskMgmt abort for CmdSN  0x0, status Failure, path vmhba1:C0:T6:L0

<6>0 :: Attempting to queue an ABORT message:CDB: 0x0 0x0 0x0 0x0 0x0 0x0

2016-10-10T22:15:44.353Z cpu9:92288823)scsi0: Command abort returning 0x2003

2016-10-10T22:15:44.353Z cpu9:92288823)WARNING: LinScsi: SCSILinuxAbortCommands:1837: Failed, Driver Adaptec AIC7902 Ultra320 SCSI adapter, for vmhba1

2016-10-10T22:15:44.353Z cpu9:92288823)WARNING: ScsiPath: 6273: Set retry timeout for failed TaskMgmt abort for CmdSN  0x0, status Failure, path vmhba1:C0:T6:L0

<6>0 :: Attempting to queue an ABORT message:CDB: 0x0 0x0 0x0 0x0 0x0 0x0

2016-10-10T22:15:46.355Z cpu9:92288823)scsi0: Command abort returning 0x2003

2016-10-10T22:15:46.355Z cpu9:92288823)WARNING: LinScsi: SCSILinuxAbortCommands:1837: Failed, Driver Adaptec AIC7902 Ultra320 SCSI adapter, for vmhba1

2016-10-10T22:15:46.355Z cpu9:92288823)WARNING: ScsiPath: 6273: Set retry timeout for failed TaskMgmt abort for CmdSN  0x0, status Failure, path vmhba1:C0:T6:L0

<6>0 :: Attempting to queue an ABORT message:CDB: 0x0 0x0 0x0 0x0 0x0 0x0







2016-10-10T22:16:12.790Z cpu14:90826950)>>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<

scsi0: Dumping Card State at program address 0x12b Mode 0x33

2016-10-10T22:16:12.790Z cpu14:90826950)Card was paused

INTSTAT[0x0] SELOID[0x6] SELID[0x60] HS_MAILBOX[0x0]

INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] DFFSTAT[0x33]

SCSISIGI[0xa4] SCSIPHASE[0x0] SCSIBUS[0xc] LASTPHASE[0xa0]

SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x0] SEQINTCTL[0x0]

SEQ_FLAGS[0x40] SEQ_FLAGS2[0x4] QFREEZE_COUNT[0x1965]

KERNEL_QFREEZE_COUNT[0x1965] MK_MESSAGE_SCB[0x3] MK_MESSAGE_SCSIID[0x67]

SSTAT0[0x0] SSTAT1[0x8] SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0xc0]

SIMODE1[0xac] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0]

LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0]

2016-10-10T22:16:12.790Z cpu14:90826950)

SCB Count = 4 CMDS_PENDING = 1 LASTSCB 0xffff CURRSCB 0x3 NEXTSCB 0x0

qinstart = 27531 qinfifonext = 27532

QINFIFO: 0x22016-10-10T22:16:12.790Z cpu14:90826950)

WAITING_TID_QUEUES:

Pending list:

  2 FIFO_USE[0x0] SCB_CONTROL[0x48] SCB_SCSIID[0x67]

  3 FIFO_USE[0x0] SCB_CONTROL[0x0] SCB_SCSIID[0x67] 2016-10-10T22:16:12.790Z cpu14:90826950)

Total 2

Kernel Free SCB list: 1 0

Sequencer Complete DMA-inprog list:

Sequencer Complete list:

Sequencer DMA-Up and Complete list:

Sequencer On QFreeze and Complete list:

2016-10-10T22:16:12.790Z cpu14:90826950)

scsi0: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0

SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]

SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]

SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0

HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] 2016-10-10T22:16:12.790Z cpu14:90826950)

scsi0: FIFO1 Free, LONGJMP == 0x81f8, SCB 0x3

SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x4] DFSTATUS[0x89]

SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0]

SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0

HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]

LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0

2016-10-10T22:16:12.790Z cpu14:90826950)scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x52

2016-10-10T22:16:12.790Z cpu14:90826950)scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0

2016-10-10T22:16:12.790Z cpu14:90826950)scsi0: SAVED_SCSIID = 0x0 SAVED_LUN = 0x0

SIMODE0[0xc]

CCSCBCTL[0x4]

2016-10-10T22:16:12.790Z cpu14:90826950)scsi0: REG0 == 0x3, SINDEX = 0x107, DINDEX = 0x10e

2016-10-10T22:16:12.790Z cpu14:90826950)scsi0: SCBPTR == 0x3, SCB_NEXT == 0xff80, SCB_NEXT2 == 0xff74

2016-10-10T22:16:12.790Z cpu14:90826950)CDB 0 0 0 0 0 0

STACK: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x02016-10-10T22:16:12.790Z cpu14:90826950)

<<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>



/var/log/vmkwarning.log


2016-10-10T22:15:40.352Z cpu14:92288823)WARNING: LinScsi: SCSILinuxAbortCommands:1837: Failed, Driver Adaptec AIC7902 Ultra320 SCSI adapter, for vmhba1

2016-10-10T22:15:40.352Z cpu14:92288823)WARNING: ScsiPath: 6273: Set retry timeout for failed TaskMgmt abort for CmdSN  0x0, status Failure, path vmhba1:C0:T6:L0

2016-10-10T22:15:42.352Z cpu14:92288823)WARNING: LinScsi: SCSILinuxAbortCommands:1837: Failed, Driver Adaptec AIC7902 Ultra320 SCSI adapter, for vmhba1

2016-10-10T22:15:42.352Z cpu14:92288823)WARNING: ScsiPath: 6273: Set retry timeout for failed TaskMgmt abort for CmdSN  0x0, status Failure, path vmhba1:C0:T6:L0

2016-10-10T22:15:44.353Z cpu9:92288823)WARNING: LinScsi: SCSILinuxAbortCommands:1837: Failed, Driver Adaptec AIC7902 Ultra320 SCSI adapter, for vmhba1

2016-10-10T22:15:44.353Z cpu9:92288823)WARNING: ScsiPath: 6273: Set retry timeout for failed TaskMgmt abort for CmdSN  0x0, status Failure, path vmhba1:C0:T6:L0

2016-10-10T22:15:46.355Z cpu9:92288823)WARNING: LinScsi: SCSILinuxAbortCommands:1837: Failed, Driver Adaptec AIC7902 Ultra320 SCSI adapter, for vmhba1

2016-10-10T22:15:46.355Z cpu9:92288823)WARNING: ScsiPath: 6273: Set retry timeout for failed TaskMgmt abort for CmdSN  0x0, status Failure, path vmhba1:C0:T6:L0

2016-10-10T22:15:57.752Z cpu14:32939)WARNING: NMP: nmpDeviceTaskMgmt:2266: Attempt to issue lun reset on device t10.IBM_____ULTRIUM2DTD3_____1210354899. This will clear any SCSI-2 reservations on t

2016-10-10T22:16:00.837Z cpu0:90826950)WARNING: VSCSI: 3485: handle 8325(vscsi2:6):WaitForCIF: Issuing reset;  number of CIF:1

2016-10-10T22:16:00.837Z cpu0:90826950)WARNING: VSCSI: 2407: handle 8325(vscsi2:6):Ignoring double reset

2016-10-10T22:16:12.791Z cpu12:73923820)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "t10.IBM_____ULTRIUM2DTD3_____1210354899" state in doubt; requested fast path state update...

2016-10-10T22:16:12.819Z cpu14:32939)WARNING: NMP: nmpDeviceTaskMgmt:2266: Attempt to issue lun reset on device t10.IBM_____ULTRIUM2DTD3_____1210354899. This will clear any SCSI-2 reservations on t

2016-10-10T22:16:15.834Z cpu14:90826950)WARNING: VSCSI: 3485: handle 8325(vscsi2:6):WaitForCIF: Issuing reset;  number of CIF:1

2016-10-10T22:16:15.834Z cpu14:90826950)WARNING: VSCSI: 2407: handle 8325(vscsi2:6):Ignoring double reset

2016-10-10T22:16:37.848Z cpu10:90826950)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "t10.IBM_____ULTRIUM2DTD3_____1210354899" state in doubt; requested fast path state update...

2016-10-10T22:16:37.851Z cpu14:32939)WARNING: NMP: nmpDeviceTaskMgmt:2266: Attempt to issue lun reset on device t10.IBM_____ULTRIUM2DTD3_____1210354899. This will clear any SCSI-2 reservations on t

TSC: 734420 cpu0:1)WARNING: ACPI: 1391: SPCR: Detected unsupported reg bit width (0); will assume 8 bits.

TSC: 739745 cpu0:1)WARNING: ACPI: 1440: SPCR: Detected invalid baud rate (0); will assume 115200

0:00:00:00.000 cpu0:1)WARNING: Serial: 641: Invalid serial port config: mem-mapped to addr 0x0.

0:00:00:05.648 cpu0:32768)WARNING: Cpu: 2145: Cache latency measurement may be inaccurate min= 145 max= 753 avg= 197

2016-10-11T01:59:58.970Z cpu1:33373)WARNING: LinuxSignal: 538: ignored unexpected signal flags 0x2 (sig 17)


Centos VM - the incident happened at 11:49


Oct 10 11:49:48 bup102 kernel: INFO: task tar:11851 blocked for more than 120 seconds.

Oct 10 11:49:48 bup102 kernel:      Not tainted 2.6.32-642.4.2.el6.x86_64 #1

Oct 10 11:49:48 bup102 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Oct 10 11:49:48 bup102 kernel: tar          D 0000000000000002    0 11851  11849 0x00000080

Oct 10 11:49:48 bup102 kernel: ffff880002207c18 0000000000000082 0000000000000000 ffff880235502e06

Oct 10 11:49:48 bup102 kernel: ffff880235cd2000 0000000000040000 0001d2340ea2d357 ffff880235cd2000

Oct 10 11:49:48 bup102 kernel: ffff880002207c38 000000011e892b77 ffff88023739c5f8 ffff880002207fd8

Oct 10 11:49:48 bup102 kernel: Call Trace:

Oct 10 11:49:48 bup102 kernel: [<ffffffff81549895>] schedule_timeout+0x215/0x2e0

Oct 10 11:49:48 bup102 kernel: [<ffffffff8129ca57>] ? kobject_put+0x27/0x60

Oct 10 11:49:48 bup102 kernel: [<ffffffff815494f3>] wait_for_common+0x123/0x180

Oct 10 11:49:48 bup102 kernel: [<ffffffff8106c500>] ? default_wake_function+0x0/0x20

Oct 10 11:49:48 bup102 kernel: [<ffffffff8154962d>] wait_for_completion+0x1d/0x20

Oct 10 11:49:48 bup102 kernel: [<ffffffffa02b0509>] st_do_scsi.clone.0+0x2c9/0x420 [st]

Oct 10 11:49:48 bup102 kernel: [<ffffffffa02b5e73>] st_write+0x5e3/0xd80 [st]

Oct 10 11:49:48 bup102 kernel: [<ffffffff81199c28>] vfs_write+0xb8/0x1a0

Oct 10 11:49:48 bup102 kernel: [<ffffffff8119b116>] ? fget_light_pos+0x16/0x50

Oct 10 11:49:48 bup102 kernel: [<ffffffff8119a761>] sys_write+0x51/0xb0

Oct 10 11:49:48 bup102 kernel: [<ffffffff810ee25e>] ? __audit_syscall_exit+0x25e/0x290

Oct 10 11:49:48 bup102 kernel: [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b

Oct 10 11:53:48 bup102 kernel: INFO: task tar:11851 blocked for more than 120 seconds.

Oct 10 11:53:48 bup102 kernel:      Not tainted 2.6.32-642.4.2.el6.x86_64 #1

.

.

.

Oct 10 12:15:47 bup102 kernel: mptscsih: ioc1: attempting task abort! (sc=ffff880209568c80)

Oct 10 12:15:47 bup102 kernel: st 3:0:6:0: CDB: Write(6): 0a 00 04 00 00 00

Oct 10 12:15:47 bup102 kernel: mptscsih: ioc1: task abort: SUCCESS (rv=2002) (sc=ffff880209568c80)

Oct 10 12:15:57 bup102 kernel: mptscsih: ioc1: attempting task abort! (sc=ffff880209568c80)

Oct 10 12:15:57 bup102 kernel: st 3:0:6:0: CDB: Test Unit Ready: 00 00 00 00 00 00

Oct 10 12:15:57 bup102 kernel: mptscsih: ioc1: task abort: SUCCESS (rv=2002) (sc=ffff880209568c80)

Oct 10 12:15:57 bup102 kernel: mptscsih: ioc1: attempting target reset! (sc=ffff880209568c80)

Oct 10 12:15:57 bup102 kernel: st 3:0:6:0: CDB: Write(6): 0a 00 04 00 00 00

Oct 10 12:16:02 bup102 kernel: mptscsih: ioc1: target reset: SUCCESS (sc=ffff880209568c80)

Oct 10 12:16:12 bup102 kernel: mptscsih: ioc1: attempting task abort! (sc=ffff880209568c80)

Oct 10 12:16:12 bup102 kernel: st 3:0:6:0: CDB: Test Unit Ready: 00 00 00 00 00 00

Oct 10 12:16:12 bup102 kernel: mptscsih: ioc1: task abort: SUCCESS (rv=2002) (sc=ffff880209568c80)

Oct 10 12:16:12 bup102 kernel: mptscsih: ioc1: attempting bus reset! (sc=ffff880209568c80)

Oct 10 12:16:12 bup102 kernel: st 3:0:6:0: CDB: Write(6): 0a 00 04 00 00 00

Oct 10 12:16:17 bup102 kernel: mptscsih: ioc1: bus reset: SUCCESS (sc=ffff880209568c80)

Oct 10 12:16:37 bup102 kernel: mptscsih: ioc1: attempting task abort! (sc=ffff880209568c80)

Oct 10 12:16:37 bup102 kernel: st 3:0:6:0: CDB: Test Unit Ready: 00 00 00 00 00 00

Oct 10 12:16:42 bup102 kernel: mptscsih: ioc1: task abort: SUCCESS (rv=2002) (sc=ffff880209568c80)

Oct 10 12:16:42 bup102 kernel: mptscsih: ioc1: attempting host reset! (sc=ffff880209568c80)

Oct 10 12:16:42 bup102 kernel: mptscsih: ioc1: host reset: SUCCESS (sc=ffff880209568c80)

Oct 10 16:09:27 bup102 kernel: imklog 5.8.10, log source = /proc/kmsg started.


0 Kudos
0 Replies