VMware Cloud Community
sbrousse
Enthusiast
Enthusiast

ESX 3.5 I/O Error

Looking to diagnose an error we found on our ESX Server this morning. Came in and all vm's on the server were down. Looked on console of ESX server and found repeated I/O Error. The server was also displaying an error "Adapter at Baseport F80Fh is not responding No adapter present" I'm trying to figure out if the error is coming from ESX or if it is hardware related. After talking to support felt that the error was not hardware related but caused by an external source such as a power failure. Can I at least rule out that this is caused by ESX?

0 Kudos
8 Replies
Texiwill
Leadership
Leadership

Hello,

Where do the datastores for your VMs live? If it was a power failure, could it have been the datastore? I.e. NFS, iSCSI, etc.

IO errors like this are very much hardware related.

Unfortunately you can not rule out ESX itself very easily, but I would rule out all the other issues, i.e. Hardware, networking, etc.

I would do the following as well:

  • verify the BIOS is set correctly for your system with ESX

  • Open up the box and reseat adapters and verify nothing obvious is wrong (found a broken heat sink this way)

  • run full vendor provided hardware diagnostics for 24-48 hours

  • run diagnostics for connections to your data stores (fiber, network, etc)

This will rule out hardware. Also find out, was there a power failure?


Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.

CIO Virtualization Blog: http://www.cio.com/blog/index/topic/168354

As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
navu
Enthusiast
Enthusiast

It usually means either the controller or something plugged into the controller is bad. First make sure there isnt a tape drive connected to the raid controller this will usualy cause the controller not to post. Next unplug the scsi cables from the card, if the error continues once the cables are unplugged i would replace the raid controller.

If it is a Dell box, I can bet on this! Smiley Happy

0 Kudos
mike_laspina
Champion
Champion

Hello,

I have experianced that message before. Since the host was still running with the errors on screen it is very likely an external storage failure event.

What is the adaptor attached to?

http://blog.laspina.ca/ vExpert 2009
0 Kudos
sbrousse
Enthusiast
Enthusiast

The ESX server is connected to a NetAPP NAS/SAN device via NFS.

0 Kudos
sbrousse
Enthusiast
Enthusiast

The ESX servers are Dell 2850 with 16gb of Ram connected to a NetAPP NAS/SAN via NFS over a 1gig network.

0 Kudos
sbrousse
Enthusiast
Enthusiast

The datstores for the VM's are located on a NetAPP NAS/SAN box using NFS.

We did not have a power failure if we did I have 4.5 hours of standby power. The other 2 ESX servers runnning 6 other VM's were fine and connected to the same NetAPP box.

We ran complete Dell diagnostics from a boot cd and all tests pass. Dell 2850.

What steps should I take to see if the BIOS is set correctly for ESX.

Do you recommend any specific tools for running tests to the datastore? Are any of these tools located in ESX?

Thanks,

Scott

0 Kudos
Texiwill
Leadership
Leadership

Hello,

We ran complete Dell diagnostics from a boot cd and all tests pass. Dell 2850.

For how long. One pass is never enough. It often takes several passes for problems to show up. 24-48hours.

What steps should I take to see if the BIOS is set correctly for ESX.

You will need to get that from Dell.

Do you recommend any specific tools for running tests to the datastore? Are any of these tools located in ESX?

Nothing inside ESX however you could get IOZone or IOmeter and run it from a VM to pummel the network.

Check your Netapp logs as well. You should be able to correlate the log entry in the NEtapp to the time of the failure.


Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.

CIO Virtualization Blog: http://www.cio.com/blog/index/topic/168354

As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
mike_laspina
Champion
Champion

Losing network connectivity would give you the same messages. Have a look at the switch logs.

http://blog.laspina.ca/ vExpert 2009
0 Kudos