thorntonm3's Posts

I have an ESX 4.0.0 Build 164009 box that disconnected from vCenter (4.0.0 - Build 258672) and is no longer able to be reached or managed by any means other than the console. This disconnect oc... See more...
I have an ESX 4.0.0 Build 164009 box that disconnected from vCenter (4.0.0 - Build 258672) and is no longer able to be reached or managed by any means other than the console. This disconnect occurred about 3-4 hours after adding vCenter, upgrading the vSphere client and upgrading the ESX host in question's license (via vCenter) from ESX Standard to Enterprise Plus (it was previously managed by vSphere client directly as a stand alone host). I also added a second ESX host (build 261974) to the Datacenter in vSphere, but turned on no features and performed no updates to either box. The update manager does show that the box with issues needs quite a few updates, but that will have to wait for later or until I can get the running machines off the box. Once I saw the number of updates needed I disabled the update manger plug in. Here's a quick synopsis of what works and what doesn't: Works: 1) Console access 2) All machines on the box are running with no issues No working: 1) Cannot connect via SSH. Telnet to port 22 shows nothing. 2) Unable to ping box 3) From the console, if I attempt to ping out to gateway or any other box I receive "ping: sendmsg: Operation not permitted" What I've tried so far: 1) Restarting management agents based on http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1003490 2) When I attempt restart of management agents, the ESX Server Host Agent will hang on stopping. I have then manually killed the pid for hostd. Once done, I am able to start the host agent, but still cannot connect to the server via SSH, vCenter, etc and have the same issues as above. 3) I have verified that all NICs are up and the switch they are connected to show UP on the appropriate ports. 4) I verified that disk space is not an issue (<40% used on / and only 4% use on /log) Extra Info: 1) I have running production machines that have not been impacted other than difficulty in managing them. 2) I will be unable to help with this situation starting tomorrow for a few days (Murphy strikes) and am hoping there is an easy/safe fix that allows me to keep the host up without a restart. I am weighing leaving it as is vs. restarting and having the entire system refuse to come up. 3) All storage is local. 4) esxcfg-route shows VMkernel default gateway as 0.0.0.0. Not sure this is an issue as other host that are working show the same. 5) I am showing 6 instances of vmware-watchdog running and 4 instances of vmware-vimsh running. 6) Looking at esxtop load is .15, .25 and .21 7) Looking at top load is 1.2, 1.1 and 1.1 Box is a Dell 2950, dual dual core box with 16GB Any thoughts and suggestions are most welcome.
Not sure what you mean. I have the resource set to High (4000 shares) on this box. See attached file. The machine will just spike to 100% utilization for no apparent reason.
The Virtual CPU hits 100% for both vCPUs. The Host/Physical shows ~6GHz used (2 cores x 3GHz procs out of 8 cores). So - only really causing trouble inside the VM. A hard reset will always b... See more...
The Virtual CPU hits 100% for both vCPUs. The Host/Physical shows ~6GHz used (2 cores x 3GHz procs out of 8 cores). So - only really causing trouble inside the VM. A hard reset will always bring it right back, but that is ugly. I'm really worried about the iSCSI messages. Thanks.
Hi, we are running ESX 3.5 (original - no updates. We have another client that sees this same thing with u4) We have a VM that experiences lock ups at seemingly random times. The machine is ... See more...
Hi, we are running ESX 3.5 (original - no updates. We have another client that sees this same thing with u4) We have a VM that experiences lock ups at seemingly random times. The machine is running RHEL 5, has 2 vCPUs, 3GB or RAM, 4 disks (two of these have been made into a single large (500GB) volume within the VM using LVM) and two NICs. The machine is running Apache and acts as a app server that connects to another VM running Postgres. The machine in question also has a fairly large file store. In general, this machine is very lightly loaded at night and just moderately loaded during the day. What we see is the CPU(s) on the machine shoot to 100%. At this point, the VM needs to be powered off or reset. The option to Shut Down guest OS is not available. The summary tab also stops showing the IP address of the machine and the status of VMware tools. Timing on these lockups appears to be random. It may happen under load or it may happen in the middle of the night with nothing going on. The vmdk is on an iSCSI SAN. When I look at the /var/log/vmkernel log, I see quite a few of the following errors. These seem to come in batches every 20-40 minutes and last for time period below. They do NOT always correlate to a problem with the machine in question. But - I am guessing they may be playing a part. Jun 9 09:00:57 virthost04 vmkernel: 489:21:24:33.412 cpu1:1066)LinSCSI: 3201: Abort failed for cmd with serial=13802204, status=bad0001, retval=bad0001 Jun 9 09:00:57 virthost04 vmkernel: 489:21:24:33.412 cpu2:1097)iSCSI: session 0x352180c0 sending mgmt 511874685 abort for itt 511874679 task 0x352025e0 cmnd 0x5a2fa80 cdb 0x2a to (1 0 1 0) at 4232667423 Jun 9 09:00:57 virthost04 vmkernel: 489:21:24:33.412 cpu1:1066)LinSCSI: 3201: Abort failed for cmd with serial=6445015, status=bad0001, retval=bad0001 Jun 9 09:00:57 virthost04 vmkernel: 489:21:24:33.412 cpu1:1066)LinSCSI: 3201: Abort failed for cmd with serial=12844297, status=bad0001, retval=bad0001 Jun 9 09:00:57 virthost04 vmkernel: 489:21:24:33.412 cpu6:1075)iSCSI: session 0x35203f90 sending mgmt 214328217 abort for itt 214328211 task 0x35202180 cmnd 0x5a2d200 cdb 0x2a to (1 0 0 0) at 4232667423 Jun 9 09:00:57 virthost04 vmkernel: 489:21:24:33.412 cpu2:1201)iSCSI: session 0x35240320 sending mgmt 280194341 abort for itt 280194330 task 0x35202ce0 cmnd 0x5a3af80 cdb 0x2a to (1 0 3 0) at 4232667423 Jun 9 09:00:57 virthost04 vmkernel: 489:21:24:33.412 cpu0:1098)iSCSI: session 0x352180c0 abort success for mgmt 511874685, itt 511874679, task 0x352025e0, cmnd 0x5a2fa80, cdb 0x2a Jun 9 09:00:57 virthost04 vmkernel: 489:21:24:33.412 cpu2:1097)iSCSI: session 0x352180c0 sending mgmt 511874686 abort for itt 511874680 task 0x35202490 cmnd 0x5a31d80 cdb 0x2a to (1 0 1 0) at 4232667423 Jun 9 09:00:57 virthost04 vmkernel: 489:21:24:33.412 cpu0:1076)iSCSI: session 0x35203f90 abort success for mgmt 214328217, itt 214328211, task 0x35202180, cmnd 0x5a2d200, cdb 0x2a Jun 9 09:00:57 virthost04 vmkernel: 489:21:24:33.412 cpu6:1075)iSCSI: session 0x35203f90 sending mgmt 214328218 abort for itt 214328216 task 0x35201fc0 cmnd 0x5a2c080 cdb 0x2a to (1 0 0 0) at 4232667423 Jun 9 09:00:57 virthost04 vmkernel: 489:21:24:33.412 cpu4:1202)iSCSI: session 0x35240320 abort success for mgmt 280194341, itt 280194330, task 0x35202ce0, cmnd 0x5a3af80, cdb 0x2a Jun 9 09:00:57 virthost04 vmkernel: 489:21:24:33.412 cpu2:1201)iSCSI: session 0x35240320 sending mgmt 280194342 abort for itt 280194335 task 0x35202260 cmnd 0x5a3cd80 cdb 0x2a to (1 0 3 0) at 4232667423 Jun 9 09:00:57 virthost04 vmkernel: 489:21:24:33.412 cpu0:1098)iSCSI: session 0x352180c0 abort success for mgmt 511874686, itt 511874680, task 0x35202490, cmnd 0x5a31d80, cdb 0x2a Jun 9 09:00:57 virthost04 vmkernel: 489:21:24:33.412 cpu2:1097)iSCSI: session 0x352180c0 sending mgmt 511874687 abort for itt 511874683 task 0x35202c00 cmnd 0x5a33900 cdb 0x2a to (1 0 1 0) at 4232667423 Jun 9 09:00:57 virthost04 vmkernel: 489:21:24:33.412 cpu0:1076)iSCSI: session 0x35203f90 abort success for mgmt 214328218, itt 214328216, task 0x35201fc0, cmnd 0x5a2c080, cdb 0x2a Jun 9 09:00:57 virthost04 vmkernel: 489:21:24:33.412 cpu6:1075)iSCSI: session 0x35203f90 (1 0 0 0) finished error recovery at 4232667423 Jun 9 09:00:57 virthost04 vmkernel: 489:21:24:33.412 cpu4:1202)iSCSI: session 0x35240320 abort success for mgmt 280194342, itt 280194335, task 0x35202260, cmnd 0x5a3cd80, cdb 0x2a Jun 9 09:00:57 virthost04 vmkernel: 489:21:24:33.412 cpu2:1201)iSCSI: session 0x35240320 sending mgmt 280194343 abort for itt 280194336 task 0x35201070 cmnd 0x5a3d000 cdb 0x2a to (1 0 3 0) at 4232667423 Jun 9 09:00:57 virthost04 vmkernel: 489:21:24:33.413 cpu0:1098)iSCSI: session 0x352180c0 abort success for mgmt 511874687, itt 511874683, task 0x35202c00, cmnd 0x5a33900, cdb 0x2a Jun 9 09:00:57 virthost04 vmkernel: 489:21:24:33.413 cpu2:1097)iSCSI: session 0x352180c0 (1 0 1 0) finished error recovery at 4232667423 Jun 9 09:00:57 virthost04 vmkernel: 489:21:24:33.413 cpu4:1202)iSCSI: session 0x35240320 abort success for mgmt 280194343, itt 280194336, task 0x35201070, cmnd 0x5a3d000, cdb 0x2a Jun 9 09:00:57 virthost04 vmkernel: 489:21:24:33.413 cpu2:1201)iSCSI: session 0x35240320 sending mgmt 280194344 abort for itt 280194337 task 0x35202b90 cmnd 0x5a3d780 cdb 0x28 to (1 0 3 0) at 4232667423 Jun 9 09:00:57 virthost04 vmkernel: 489:21:24:33.413 cpu4:1202)iSCSI: session 0x35240320 abort success for mgmt 280194344, itt 280194337, task 0x35202b90, cmnd 0x5a3d780, cdb 0x28 Jun 9 09:00:57 virthost04 vmkernel: 489:21:24:33.413 cpu2:1201)iSCSI: session 0x35240320 (1 0 3 0) finished error recovery at 4232667423 I spent an hour on the phone with HP yesterday and they ran diags, checked logs on the SAN, etc. and said the SAN was fine. They agreed that it appears to be a timeout issue, but thought that it was coming from ESX. Is there a setting to increase the timeout limit? Has anyone else seen this? In searching for the above errors and for a scenario like this, I found a few posts, but most look to either have cleared themselves or the solution was never found. Any help is much appreciated.
All, thanks to all for the help with this issue. I ended up using bits and pieces of most of the suggestions and would probably have tried KjB's last suggestion if I hadn't been so far along with... See more...
All, thanks to all for the help with this issue. I ended up using bits and pieces of most of the suggestions and would probably have tried KjB's last suggestion if I hadn't been so far along with the vmkfstools clone work. So, for others that have vmdk files with snapshots but no vmx or vmsd file, here's what worked for me: 1) First - do a full backup. I used mine a couple of times. Without this, I would have been out of luck as I tried various solutions. 2) Don't Panic (probably should have been #1). 3) As long as you are able to commit one of the snapshots, you should be in good shape. For me, all I needed was to end up with the last snapshot as my starting point. 4) Ended up using vmkfstools to clone the disks and used the desired snapshot vmdk as the source for the clone operation. Just used putty to jump onto one of my esx's service console. Command is essentially vmkfstools /vmfs/volumes/.vmdk. If you have more than one disk, as I did, you can console into another esx box and use its service console to issue the same command again on the next vmdk file. Just be sure to change the destination file name. 5) One nice thing about #4 was that you get immediate confirmation or erros or of successful clone start. The % complete shown is also accurate and doesn't jump all over the place. No time remaining is shown, but it's easy to extrapolate based on time it takes to complete each percentage point. 6) Once done, create a new virtual machine in VC. Choose the Custom configuration in the Add New Virtual Machine wizard and then choose Add Existing Disk when you get to that option. Be sure you choose the correct adapter (mine was Buslogic and ESX defaulted to lsilogic). 7) Make any other changes (network, etc.) and boot. Mine came right up and need no other config. Just worked. I felt very lucky. So, thanks again. I can now go on vacation!
See comment below from follow up after marking question answered.
KjB, That is a great suggestion. Didn't realize that I could edit the newly created .vmx file and just point it at the lifeapp001-000004.vmdk file. Should have thought through that. ... See more...
KjB, That is a great suggestion. Didn't realize that I could edit the newly created .vmx file and just point it at the lifeapp001-000004.vmdk file. Should have thought through that. I am about 40% done with doing the vmkfstools clone/copy (-i). Used the latest snapshot (yes - they are all just parent - child with no branches) as the source and set a new directory as destination. I think this is going to result in a single vmdk for each of my to disks that will have the latest snapshot data in it that I can add to a newly created virtual machine. Please let me know if that's not the case. Thank you, -TTM
I chose the base .vmdk, but when I booted I was back at a point in time prior to the snapshots. The snapshot window didn't show any snapshots available for me to jump to. Is that because I am... See more...
I chose the base .vmdk, but when I booted I was back at a point in time prior to the snapshots. The snapshot window didn't show any snapshots available for me to jump to. Is that because I am missing the vmsd file? Is there a way to recreate that file? If not, would the highest numbered file be the latest snapshot? As for the snapshots - I do know that they were created just prior to a number of server upgrades (exchange std to exchange ent, etc.). They were applied to the server and we never jumped back in time. Just kept running on the latest. So - the only one I really care about it the latest snapshot. After a bunch of reading, I was going to run a vmkfstools -e lifeapp001-000004.vmdk &lt;new folder&gt;/lifeapp001new.vmdk command. I read here http://www.vmworld.com/vmworld/message/1967 that this should create a single vmdk file with the latest info in it. I would do it for each of the two disks with their latest snapshots. I was going to create a new VM and then use these two existing disks. Do you think this might work?
So, I made a backup of the old directory with my vmdk files on my local machine. I then copied this to a new directory on another vmfs volume. I then created a new virtual machine and added the... See more...
So, I made a backup of the old directory with my vmdk files on my local machine. I then copied this to a new directory on another vmfs volume. I then created a new virtual machine and added the existing disks in this volume to the machine. My problem was that it didn't let me choose the highest numbered file, only the baseline name of the file. When I booted, I was back several days ago. Granted - I now have a worst case scenario - I can be a few days back, but obviously not optimal. I've attached a screen capture of the directory of files as they exist in my local backup. You can see that I'm missing the vmx, vmsd, maybe others. Just FYI - I don't need the snapshots anymore. Would just like to get back to the most recent state and start it up. Do I need to merge the snapshots back into the base disk with vmkfstools or is there a way to just pick the correct version of the disk to run from? Any other thoughts? I am now copying the original files back out to a new directory on my vmfs volume since I changed the parent vmdk when I booted earlier. Once this is done (few more hours), I think I'll be ready to try any suggestions. Thank you. Message was edited by: thorntonm3
Thanks to both of the above for good info. I'm still copying (and will be for a few more hours) to get a good backup before trying anything. I'll be sure to give credit once I am able to try th... See more...
Thanks to both of the above for good info. I'm still copying (and will be for a few more hours) to get a good backup before trying anything. I'll be sure to give credit once I am able to try the suggestions. Two clarifying questions: 1) When I create the new virtual machine and I add these existing VMDK files to the machine, which vmdk should I choose? The snapshot one, the base file or something else? Not clear on which one to pick. 2) If I don't care about keeping the snapshots, but just don't want to lose any data that occured after they were taken, is there any change to the recommendations above? Thank you.
I have a virtual machine that I can no longer see in my VC 2.5. The good news is that the vmdk files (along with 4 snapshots), are sitting out on a VMFS partition on my SAN. I feel confident t... See more...
I have a virtual machine that I can no longer see in my VC 2.5. The good news is that the vmdk files (along with 4 snapshots), are sitting out on a VMFS partition on my SAN. I feel confident that no data has been lost and that I just need to perform the right steps to get the machine going again. The bad news is that the machine is missing its .vmx file. Not sure how that happened, but its gone. I do have the vmware.log file. Saw one post that mentioned recreating the .vmx file from that .log file but it looked dicey. I need to get this machine up and running again as quickly as possible (Exchange server). I have two questions: 1) Is it ok to create a new VM and simply add my existing vmdk files to the machine? The New Machine wizard allows me to do this, but I want to make sure the data in the snapshots is taken into account. Will the new machine know that snapshots exist? 2) I want to make sure I have a backup of the machine/folder on the vmfs volume. Is there a way to copy intra-vmfs volumes without bringing the files out to local disk and then back again? Its about 200GB and takes forever to copy that way. I have another VMFS volume I could copy to. Thank you for any ideas!