VMware Cloud Community
pauliew1978
Enthusiast
Enthusiast

serious issues in cluster

Dear all,

I am having some really bad problems in my Vsphere infrastrcuture. On friday I removed a lun from the san first instead of removing it from the esx servers. From that point on I have had major issues with virtual machines dropping off the network every 30-60 mins. I had this problem before a good few months back and it proved to have happened because of a bug in vmware. I am on the vsphere version 4 (not update 1). I know that this issue has been resolved in update1 and there is a command which stops the "all paths dead" issue. When this issue happens if you look at the esx servers cpu it just drops off to nothing for a few mins then comes back. Often the esx server drops out of the cluster and the vm's show as disconnected but it does come back.

I rebooted the whole infrastrucutre on Friday night and the problematic datastore was not showing on any of the esx servers (which is what i wanted). I booted all the vm's backup and thought this had solved the problem.

This morning I have had issues with one particular esx server which has completed droped out of the cluster though the vm's are still running (whichj is just as well as they are our main production sql box amongst others). I have issues the vpxa restart command, host restart command and restarted virtual center but its still not coming back. I think the only thing I can do is wait untill the end of the day and reboot it again.

The other issue I am having is that one of my datastores has not come backup. It is showing in the datastore list but in the vmkwarning logs I am getting scsi reservation conflict errors and i/o timeout errors.

I think I can issue a lun reset command but I am really worried it will kill the entire esx box. Does anyone have any ideas what ican do or has anyone experienced anything like this. I am concerned th\t there is till metadata hanging around for the old lun that i removed on all the esx boxes.

thanks,

Paul

Reply
0 Kudos
1 Reply
marcelo_soares
Champion
Champion

I think this is very delicate to be handled here. My first advice to you is to update as soon as possible to U1, as not only the ADP state problem were corrected concerning storage access.

Also, for removing LUNs from ESX, you need to follow the instructions here: http://kb.vmware.com/kb/1015084

Additionally, maybe you will need some help on the SAN vendor and VMware support also.

Good luck,

Marcelo Soares

VMWare Certified Professional 310/410

Technical Support Engineer

Globant Argentina

Consider awarding points for "helpful" and/or "correct" answers.

Marcelo Soares