Checking integrity of a ESXi installation

gbras · ‎09-15-2009

Hi,

today one of our servers (dell PE 2950) had a routine maintenance: a dell technician replaced the raid controller battery.

On that server we have an ESXi installation (the free version, update 4, v.3.5.0 153875, standalone -without VirtualCenter): after replacing the raid controller battery the ESXi booted, but:

-lost the IP address of management interface/console (IP address was 0.0.0.0)

-lost all network setup (vnics/vswitch and iscsi)

-went automatically in lockdown mode

after a couple of reboots, working with at the console, I managed to set lockdown mode to off (initially it refused to disable lockdown).

Then I had rebuild all the network setup (in console and then in the VI client -> rebuilt vswitches and assigned vnics) and to reconfigure iscsi

I tested the host with some reboots but:

-1 over 3 times it boots in lockdown mode, even if I had disabled it

-at every boot the automatic startup/shutdown setup is lost (all VM are in the manual startup group, even those I placed in automatic startup in a given sequence) and VMs do not start automatically.

So is there a way to do an "integrity check" of the ESXi installation/local HD to assure that it is OK?

by the way, I enabled ssh login

Thank you

Guido

bulletprooffool · ‎09-15-2009

I have a similar problem before . . the probklem is your that your config should automatically get backed up at 1 minute past the hour . . every hour. You more than likely have aq corrupt disk space . .. meaning this can not function properly . . so your changes become session only changes.

Follow the link below for info on how to resolve the disk issue:

http://www.vm-help.com/esx/esx3i/check_system_partitions.php

Then wait until about 5 past the hour and review logs for any failed backup errors.

One day I will virtualise myself . . .

gbras · ‎09-15-2009

Many thanks for the reply and for the useful link to the docs about ESXi file system partitions check.

These are my ESXi partititions:

Device Boot Start End Blocks Id System
/dev/disks/vmhba1:0:0:1 5 750 763904 5 Extended
/dev/disks/vmhba1:0:0:2 751 4845 4193280 6 FAT16
/dev/disks/vmhba1:0:0:3 4846 69376 66079744 fb VMFS
/dev/disks/vmhba1:0:0:4 * 1 4 4080 4 FAT16 <32M
/dev/disks/vmhba1:0:0:5 5 52 49136 6 FAT16
/dev/disks/vmhba1:0:0:6 53 100 49136 6 FAT16
/dev/disks/vmhba1:0:0:7 101 210 112624 fc VMKcore
/dev/disks/vmhba1:0:0:8 211 750 552944 6 FAT16
Partition table entries are not in disk order
~ # esxcfg-vmhbadevs -f
vmhba1:0:0:8 /vmfs/devices/disks/vmhba1:0:0:8 a79407ec-71c546c0-1368-0fc9b0ac7595
vmhba1:0:0:6 /vmfs/devices/disks/vmhba1:0:0:6 5bc62b73-7c202adb-f01f-97b43777d751
vmhba1:0:0:5 /vmfs/devices/disks/vmhba1:0:0:5 3a02cc70-99a3b655-3dd7-64ab30093543
vmhba1:0:0:2 /vmfs/devices/disks/vmhba1:0:0:2 49e5e6f3-43b0a7f7-c3f6-002219a79228
~ # ls -l | grep vmfs
l--
0 root root 1984 Jan 1 1970 altbootbank -> /vmfs/volumes/5bc62b73-7c202adb-f01f-97b43777d751
l
0 root root 1984 Jan 1 1970 bootbank -> /vmfs/volumes/3a02cc70-99a3b655-3dd7-64ab30093543
l
0 root root 1984 Jan 1 1970 scratch -> /vmfs/volumes/49e5e6f3-43b0a7f7-c3f6-002219a79228
l--
0 root root 1984 Jan 1 1970 store -> /vmfs/volumes/a79407ec-71c546c0-1368-0fc9b0ac7595

I run a dosfsck pass on each of the four partitions that look meaningful to me: one (/scratch) ended ith success, the others without error.

May I consider them OK?

# dosfsck -t -r /dev/disks/vmhba1:0:0:2
dosfsck 2.11, 12 Mar 2005, FAT32, LFN
Seek to 2147491840:Success
~ # dosfsck -t -r /dev/disks/vmhba1:0:0:5
dosfsck 2.11, 12 Mar 2005, FAT32, LFN
/dev/disks/vmhba1:0:0:5: 10 files, 39607/48927 clusters
~ # dosfsck -t -r /dev/disks/vmhba1:0:0:6
dosfsck 2.11, 12 Mar 2005, FAT32, LFN
/dev/disks/vmhba1:0:0:6: 2 files, 1/48927 clusters
~ # dosfsck -t -r /dev/disks/vmhba1:0:0:8
dosfsck 2.11, 12 Mar 2005, FAT32, LFN
/dev/disks/vmhba1:0:0:8: 34 files, 11546/34549 clusters
~

To wich logfile should I check fo errors on 5minutes past the hour?

In /var/log/messages I can't find anything meaningful:

Sep 15 14:04:19 sfcb[57902]: storelib Physical Device Device ID : 0x2
last message repeated 1 times
Sep 15 14:04:43 vmkernel: 0:04:07:52.202 cpu4:1977)WARNING: UserSocketInet: 588: waiters list not empty!
Sep 15 14:04:43 Hostd: Activation : Invoke done on
Sep 15 14:04:43 Hostd: Throw vmodl.fault.RequestCanceled
Sep 15 14:04:43 Hostd: Result:
Sep 15 14:04:43 Hostd: (vmodl.fault.RequestCanceled) { dynamicType = <unset>, msg = "" }
Sep 15 14:04:43 Hostd:
Sep 15 14:04:43 Hostd: Failed to send response to the client: Broken pipe
Sep 15 14:05:21 sfcb[2654]: storelib Physical Device Device ID : 0x2
last message repeated 11 times
Sep 15 14:05:51 sfcb[58229]: storelib Physical Device Device ID : 0x2
last message repeated 3 times
Sep 15 14:05:51 sfcb[58234]: storelib Physical Device Device ID : 0x2

All

Checking integrity of a ESXi installation