DyJohnnY
Enthusiast
Enthusiast

Power outages, now ESX 4.0 host is slow, Vm's malfunction, I/O error on console

Hello,

I seem to have an issue with a test machine running esx 4.0 and vsphere 4.0.

We had some power outages lately and the machines has no redundancy built in, no power or disk raid. therefore our host shut-down forced. When we restarted it i noticed slow performance and lots of missing info's from the vclientgui.

When the outages stopped we noticed following:

1. vclient shows no memory and cpu usage

2. Summary, resource allcation tabs show as if the hosts would be using no resources

3. Performance tab show intrerrupted graphs for CPU, no graph for memory.

4. All the other tabs take long to display.

i've attached a screenshot with the gui issues.

ON the console when i hit alt+F1 i get a lof of these messages:

http://uptime.timestamp end request: I/O error dev sdc, sector

The vm's seem to work, mostly the linux ones. I can ping them and use ssh.

The windows hosts, also, sort of work,i can open rdp session but then i get "workstation service not running error, and i can't login".

I tried using a console,but screen is still dark, and i can't see anything,probably console screen not loading.

I'm guessing it's a disk issue at least judging by the error message, but i'm not sure how to troubleshoot further or how to check the disk from the esx host console.

Thanks for reading, hope someone can point me in the right direction.

ionut

IonutN
0 Kudos
11 Replies
jb12345
Enthusiast
Enthusiast

What brand of server is this? Some manufacturers have ways to look at the boot process so that you can see if there were errors. For instance on Dell if you have a DRAC you can replay the boot sequence.

0 Kudos
DyJohnnY
Enthusiast
Enthusiast

it's a test machine,as i said,and we have it running on a generic system, Gigabyte ICH10R based chipset with intel 1000 MT nics added.

The machine has been running like a charm for 4-5 monts now, the chipset is supported, but that's all. The storage is made up of locally attached disks, no raid.

My thought was to get out my HDD regenerator disk and try that but was hoping for more info before i actually go ahead with this.

IonutN
0 Kudos
jb12345
Enthusiast
Enthusiast

Can you ssh in to the host & look at the logs? I don't suppose you have any remote access to see what happens on a reboot (something similar to Dell DRAC or HP ILO)? Have you looked at the bios to see if it is seeing all your drives?

0 Kudos
DyJohnnY
Enthusiast
Enthusiast

HI,

Hmm, maybe my wording was off i'll try to rephrase.

Yes, ssh works on the host,slowly but works,i can look at logs, but i don't know which ones. where should i look?

I can see the boot up process on the monitor, i have a monitor and keyboard connected, there are no errors upon boot-up as far as i can tell. Problems start when VM's start powering up, and probably actually use the disk.

Bios is seeing all my drives. Let me explain again. The hypervisor starts up, vm's automatically start up, everyhing seems to be running, the datastore has no data loss, otherwise i'd expect esx to throw an error regarding a VM disk or config file.

About the VM's - I can ping them, i can ssh to linux, i can get to logon screen on RDP on windows. i can NOT open the VM CONSOLE's.

IonutN
0 Kudos
jb12345
Enthusiast
Enthusiast

Check

/var/log/vmkernel, /var/log/vmkwarning & /var/log/messages

When ESX starts did you get a message about needing to check file integrity (or something like that.) It's been awhile since I saw the message.

What does the the VM Performance tab look like them?

0 Kudos
DyJohnnY
Enthusiast
Enthusiast

hi j,

I just booted with hiren's boot CD which diagnosed the ESX disk as "not error free" it's also making strange noises for a hard drive . i guess the question is answered, now i'd like to know if there is any way i can completely move ALL the data from the damaged disk to another disk i have standing by?

I'm interested primarily in the datastore files, as i don't assume i can just copy over the esx partitions and everything will magically just work on the new drive.

Thanks for the help sofar.

IonutN
0 Kudos
jb12345
Enthusiast
Enthusiast

There may be a couple of options.

Do you have another ESX host?

You can use SCP to copy the VM files off to another host.

Converter? P2V the VM(s) and put them on another host or hold them on a PC for restoration.

DyJohnnY
Enthusiast
Enthusiast

Hi,

Thanks for input, gave me a couple of ideas.

I've done following:

- Added the local disk to the datastores of the esx host. Will the datastore on this new disk be visible when I rebuild ESX on the damaged disk on a new ESX install and add it as a new datastore? i can only assume the FS will remain the same and he won't attempt to format the datastore, or wipe it.

- I started copying the files to the new datastore, it's going slow, but ti's going. I'm doing this via the GUI atm . Would it be faster/safer to use "CP" from the console?

Thanks,

ionut

IonutN
0 Kudos
jb12345
Enthusiast
Enthusiast

Isn't that datastore on the disk that is bad? If you remove it then you won't have access to the datastore on it. Or is the datastore on a separate disk? If it's on a separate disk it may be available when you rebuild ESX host. I'd make backups of the VMs just as insurance.

0 Kudos
jb12345
Enthusiast
Enthusiast

I'll be offline for 4 days so can't offer any more assistance. Good luck.

0 Kudos
DyJohnnY
Enthusiast
Enthusiast

Thanks for all the input.

The disk is bad, but it's running, i can still get data from it, slowly.

I do have another disk standing by, that i've copied some VM's to. That disk is simply an empty disk that's ive formatted and added to the esx host datastore inventory

I'd still be interested in a method of somehow cloning the esx install from one disk onto another, if someone has any input on this issue.

Thanks,

IonutN
0 Kudos