VMware Cloud Community
hihiy
Contributor
Contributor

ESX shows "input/output error"

There are two IBM x3650 in my cluster.

Today when I login to VI Client via remote console, ESX server shows 'no responding'. All VMs are hosted by this ESX server are showed 'disconnected', and the icon is grey. I tried to login VMs by RDP, it is ok. VMs applications are still running.

I also tried to login ESX server via SSH, fine. But, I cannot give any command. When I run command, it shows " *****input/output error".

I shutdown VMs via RDP, and reboot ESX Server via local console, restart VMs. The problem is recovered.

Anybody please give me some ideas???

Thanks in advanced.

hihiy

In addition, ESX is 3.0.1 & VC is 2.0.1

Reply
0 Kudos
14 Replies
lholling
Expert
Expert

How are you connecting to the server. At a guess I would say that it is via Virtual Center.

If this is the case have you recently upgraded it? If so then it may be that the server cannot connect because the new VC agent cannot install. What you need to do is make sure that the /tmp/vmware-root directory is available.

If you do not have VC then are both servers behaving the same when you connect directly to them via he VI Client? What happens when you look at the servers via the web client?

What versions of ESX, VC and VI Client are you running?

Leonard...

---- Don't forget if the answers help, award points
Reply
0 Kudos
Fabio_Pitzolu
Enthusiast
Enthusiast

Hello.

You should try to run the command

service mgmt-vmware restart[/i] (some ESX Server Management services)

or

service vmware-vpxa restart[/i] (VMware ESX Server Host Agent)

from the local console and not via SSH.

If it happens again try those commands, and post again.

Bye Smiley Happy

Reply
0 Kudos
hihiy
Contributor
Contributor

It is same when I try command by remote console or local console. Is any problem with Hardware???

Thanks again!

Reply
0 Kudos
dominic7
Virtuoso
Virtuoso

How full are your filesystems? What is the output of 'vdf -h'?

Reply
0 Kudos
-k-
Contributor
Contributor

Hi,

Got the same problem today.

try to access server using vic --> timeout

log on using ssh --> ok

BUT

\[root@esx02 /]# service mgmt-vmware restart

-bash: /sbin/service: Input/output error

\[root@esx02 /]#

??

and

\[root@esx02 /]# vdf -h

-bash: /usr/sbin/vdf: Input/output error

\[root@esx02 /]#

and even more

\[root@esx02 /]# ll /etc

total 0

\[root@esx02 /]# ll /var

total 0

\[root@esx02 /]# ll /boot

total 0

\[root@esx02 /]# ll /vmfs/volumes/vmfs1/srv009

total 21529024

-rw------- 1 root root 1073741824 Jun 20 22:44 srv009-3de08280.vswp

-rw------- 1 root root 20971520000 Sep 7 01:34 SRV009-flat.vmdk

-rw------- 1 root root 8664 Jun 20 22:44 srv009.nvram

-rw------- 1 root root 339 Jun 20 22:45 SRV009.vmdk

-rw------- 1 root root 0 Jun 14 23:35 srv009.vmsd

-rwxr-xr-x 1 root root 1369 Jun 20 22:44 srv009.vmx

-rw------- 1 root root 250 Jun 20 22:35 srv009.vmxf

-rw-rr 1 root root 10599 Jun 14 23:35 vmware-1.log

-rw-rr 1 root root 21882 Jun 20 22:16 vmware-2.log

-rw-rr 1 root root 20191 Aug 30 00:03 vmware.log

\[root@srvesx02 /]#

so, files in vmfs are ok. rdp to vms are ok.

Any ideas.

running esx-3.0.1-32039 on IBM system X 3850, local disk no san.

Reply
0 Kudos
mbrkic
Hot Shot
Hot Shot

Can you do 'df' or 'mount'?

Seems like something is wrong with your local linux filesystems.

Also, have you rebooted, and does that fix the problem? If not, can you boot into the linux only kernel and poke around?

Reply
0 Kudos
dominic7
Virtuoso
Virtuoso

Hi,

Got the same problem today.

try to access server using vic --> timeout

log on using ssh --> ok

BUT

\[root@esx02 /]# service mgmt-vmware restart

-bash: /sbin/service: Input/output error

\[root@esx02 /]#

??

and

\[root@esx02 /]# vdf -h

bash: /usr/sbin/vdf: Input/output error

\[root@esx02 /]#

and even more

\[root@esx02 /]# ll /etc

total 0

\[root@esx02 /]# ll /var

total 0

\[root@esx02 /]# ll /boot

total 0

\[root@esx02 /]# ll /vmfs/volumes/vmfs1/srv009

total 21529024

-rw------- 1 root root 1073741824 Jun 20

22:44 srv009-3de08280.vswp

-rw------- 1 root root 20971520000 Sep 7

01:34 SRV009-flat.vmdk

-rw------- 1 root root 8664 Jun 20

22:44 srv009.nvram

-rw------- 1 root root 339 Jun 20

22:45 SRV009.vmdk

-rw------- 1 root root 0 Jun 14

23:35 srv009.vmsd

-rwxr-xr-x 1 root root 1369 Jun 20

22:44 srv009.vmx

-rw------- 1 root root 250 Jun 20

22:35 srv009.vmxf

-rw-rr 1 root root 10599 Jun 14

23:35 vmware-1.log

-rw-rr 1 root root 21882 Jun 20

22:16 vmware-2.log

-rw-rr 1 root root 20191 Aug 30

00:03 vmware.log

\[root@srvesx02 /]#

so, files in vmfs are ok. rdp to vms are ok.

Any ideas.

running esx-3.0.1-32039 on IBM system X 3850, local

disk no san.

If I were you, I'd start using VMware converter to migrate those VMs ASAP. I think you're on track for a very serious problem ( assuming the VMs are important ).

Reply
0 Kudos
hihiy
Contributor
Contributor

filesystem is fine when I check by 'dh -f'.

Also, I found these messages in RED on the local service console display.

0:19:06:41.067 cpu0:1024)VMNIX: <0>scsi: device set offline - command error recovery failed: host0 channel0 id0 lun0

0:19:06:41.111 cpu0:1024)VMNIX: <0> journal commit I/O error.

And VMware Support Team reply me and say, we should upgrade ESX to 3.02, will fine.

Any ideas???

Thanks again.

Reply
0 Kudos
FredPeterson
Expert
Expert

Did they tell you WHY? What bug specifically is fixed?

To me it looks lke your local SCSI controller or RAID controller is flaking. Do you have any monitoring of that going on?

Reply
0 Kudos
Rumple
Virtuoso
Virtuoso

I think there is a patch that corrects linux filesystems going Read Only which also pertains to the SC as well.

I've had that happen once before on one of my hosts...

Reply
0 Kudos
Erik_Zandboer
Expert
Expert

I saw this not so long ago on a test system of mine. I guess there have been SCSI I/O errors, thereby disconnecting the SCSI storage from the ESX host. VMs where still up and running. Reboot of the host fixed the problem. I have not had time to look into it further (it is a test system with non-supported hardware anyway)

Visit my blog at http://www.vmdamentals.com
Reply
0 Kudos
hihiy
Contributor
Contributor

No, they didnt say why.

They say: I would like to update you that we have similar kind of issue reported by one of our customer and this issue shouldn't happen on ESX Server 3.0.2 version. Can you please investigate the possibilities of upgrading to the latest release of ESX server and update us with your findings.[/i]

I DO NOT think it can help me out.

I seached with 'device set offline' in VMTN. There are lots of information. Its because of RAID controller. It seems upgrade the x3650 serverraid firmware will solve the issue, please refer: http://www-304.ibm.com/jct01004c/systems/support/supportsite.wss/docdisplay?lndocid=MIGR-5071075&bra... for detail.

Hope this can help.

More ideas???

Thanks in advance.

hihiy

Reply
0 Kudos
hihiy
Contributor
Contributor

ah, this problem is fixed by upgrading firmware of IBM Serverraid.

Reply
0 Kudos
JDLangdon
Expert
Expert

What version of ServeRAID are you using?

Jason

Reply
0 Kudos