hello,
i have a centos vm running on esx 3.5, every night the centos vm is crashing with the following error:
sd 0:0:0:0: SCSI error: return code = 0x08000002
un 28 02:48:28 vmon kernel: sda: Current: sense key: Aborted Command
Jun 28 02:48:28 vmon kernel: Add. Sense: Some commands cleared by iSCSI Protocol event
Jun 28 02:48:28 vmon kernel:
Jun 28 02:48:28 vmon kernel: Info fld=0x0
Jun 28 02:48:28 vmon kernel: end_request: I/O error, dev sda, sector 23369653
Jun 28 02:48:28 vmon kernel: Buffer I/O error on device dm-0, logical block 2895053
Jun 28 02:48:28 vmon kernel: lost page write due to I/O error on dm-0
Jun 28 02:48:28 vmon kernel: sd 0:0:0:0: SCSI error: return code = 0x08000002
Jun 28 02:48:28 vmon kernel: sda: Current: sense key: Aborted Command
Jun 28 02:48:28 vmon kernel: Add. Sense: Some commands cleared by iSCSI Protocol event
after this event either the / partition becomes read-only or the var partition
i have to reboot to get the host back online
any suggestion on how to fix this issue?
my fstab is as follows:
/dev/VolGroup00/LogVol00 / ext3 defaults 1 1
/dev/VolGroup00/LogVol02 /var ext3 defaults 1 2
LABEL=/boot /boot ext3 defaults 1 2
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
/dev/VolGroup00/LogVol01 swap swap defaults 0 0
Beginning of the block added by the VMware software
.host:/ /mnt/hgfs vmhgfs defaults,ttl=5 0 0
End of the block added by the VMware software
i am running:
Linux vmon. 2.6.18-53.el5 #1 SMP Mon Nov 12 02:22:48 EST 2007 i686 i686 i386 GNU/Linux
PV /dev/sda2 VG VolGroup00 lvm2 http://19.88 GB / 0 free
Total: 1 http://19.88 GB / in use: 1 http://19.88 GB / in no VG: 0
ACTIVE '/dev/VolGroup00/LogVol00' http://15.00 GB inherit
ACTIVE '/dev/VolGroup00/LogVol02' http://4.12 GB inherit
ACTIVE '/dev/VolGroup00/LogVol01' http://768.00 MB inherit
Reading all physical volumes. This may take a while...
Found volume group "VolGroup00" using metadata type lvm2
thank you
Hello,
WHat is happening on that ESX Sever every time the system shows this error? Note you should also review your /var/log/vmkernel logfile for correlation of time and actions. Are you running backups at this time or anything else that could cause the iSCSI server to go away?
Best regards,
Edward L. Haletky
VMware Communities User Moderator
====
Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.
CIO Virtualization Blog: http://www.cio.com/blog/index/topic/168354
As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization
Take a look at this VMware KB. It sounds like an old bug I saw on some RHEL P2Vs. This looks surprisingly similar, though last time it was blatant time-outs, but those time-outs would set my / partition to read-only.
I ran into this problem about a year or so ago with RHEL kernels < 2.6.9-55.EL with the mpt scsi driver. I believe I found a kernel >= 2.6.9-55.EL that didn't have this driver version and its associated problem. I see by your uname you are at 2.6.18. If you refer to the linked KB, they updated it recently indicating kernels 2.6.22 as a fix. It was probably fixed, regressed and fixed again. A simple kernel upgrade to >2.6.22 might resolve the problem.
This link might provide some more explanation into the problem as well.
You should be able to get your mpt driver version from 'cat /proc/mpt/version'.
If you want to isolate if it's an mpt driver issue (since it may not be the same issue), install the buslogic driver into the VM (probably isn't there by default, wasn't for RHEL) and you could switch the disk controller type on the VM to buslogic.
Hope that gets you started on a fix.
Why are you using iSCSI initiator inside VM?
Certainly a best practice would be you configure ESX as iSCSI initiator for those LUNs and deliver them as an RDM for the VM.
It would improve your performance accessing those LUNs and maybe solve your problem.
Regards,
Guilherme Schäffer
Infiniit | www.infiniit.com.br
i am not running iSCSI on the VM
iSCSI is being used to share three LUNs to the ESX server
there is no need for the VM to use iSCSI or use RDM
i am not sure why iSCSI is showing up in the VM logs.....
i am going to try to upgrade the kernel, perhaps that will help
i am also running vizioncore vrangerpro, i am going to disable that and see if it makes a difference
What lvdisplay show?
Could you print lvdisplay output here?
Guilherme Schäffer
Infiniit | www.infiniit.com.br
according to the KB article i should upgrade from rhel 5 to 5.1
i am already running centos 5.1 which is the "same" as rhel 5.1
i upgraded to centos 5.2 to see if that fixes the issue
i also upgraded the kernel from 2.6.15-53.e15 to 2.6.18-92.1.6.el5
i will watch it tonight and see what happens
--- Logical volume ---
LV Name /dev/VolGroup00/LogVol00
VG Name VolGroup00
LV UUID o2gHwk-yHCQ-xymd-Fntc-G3Ip-0rLZ-ucT9FW
LV Write Access read/write
LV Status available
open 1
LV Size 15.00 GB
Current LE 480
Segments 1
Allocation inherit
Read ahead sectors 0
Block device 253:0
--- Logical volume ---
LV Name /dev/VolGroup00/LogVol02
VG Name VolGroup00
LV UUID p3i7SS-gely-M0y4-XANq-Aiff-y8Zz-rFhlpg
LV Write Access read/write
LV Status available
open 1
LV Size 4.12 GB
Current LE 132
Segments 1
Allocation inherit
Read ahead sectors 0
Block device 253:1
--- Logical volume ---
LV Name /dev/VolGroup00/LogVol01
VG Name VolGroup00
LV UUID nruE1i-iZIB-pxg9-0gVR-Ykiu-TYOa-uulBcz
LV Write Access read/write
LV Status available
open 1
LV Size 768.00 MB
Current LE 24
Segments 1
Allocation inherit
Read ahead sectors 0
Block device 253:2