VMware Cloud Community
fernandomm2
Enthusiast
Enthusiast

Raid 10 and Input/output error

We've an ESXi 6 server ( Build 2039750 ) that was running fine for months. It uses an LSI Megaraid RAID card with 8 SSDs + Battery in RAID 10. The RAID controller reports no issue and we didn't had any power failures.

Some of the VMs stopped responding today and, after trying several things to stop/restart them, nothing worked. in SSH i'm getting some input/output errors in specific folders:

[root@localhost:/vmfs/volumes] ls

ls: ./c024ed7a-8e112696-2ae9-b4a955c7694f: Input/output error

But as i already mentioned, our RAID setup is fine.

Is there anything that we do to fix this issue? Maybe some kind of fsck in ESXi?

Would upgrading to a newer version help?

6 Replies
SureshKumarMuth
Commander
Commander

Generally, IO errors occurs due to the storage connectivity issues. Here, I can understand that your datastore is not accessible. This could be due to the issue with RAID controller driver/firmware also not necessarily to be a hardware failure.

Please check /var/log/vmkernel.log for more information on storage related errors.

Regards,
Suresh
https://vconnectit.wordpress.com/
Reply
0 Kudos
continuum
Immortal
Immortal

> Would upgrading to a newer version help?
No - do not try that.
In my experience the only cure for I/O errors is to copy the affected files to a different datastore.
That sounds strange - I know - as ESXi will not copy files with I/O errors. The trick is not to use ESXi for the copies.
If you create a VMFS-header-dump - see

http://vm-sickbay.com/create-a-vmfs-header-dump-using-an-esxi-host-in-production

and provide a downloadlink I may be able to help.
Feel free to call me via skype.
Ulli


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

fernandomm2
Enthusiast
Enthusiast

After the third reboot, this server started to work normally. All VMs booted normally and their respective filesystem/files were intact.

I really don't get why it failed like this.

Since it failed only once and we have a good backup policy I won't make any additional changes now.

Thanks a lot for all replies.

Reply
0 Kudos
SureshKumarMuth
Commander
Commander

good to hear that. However, it is good to review the logs and find the cause so that you can take corrective actions accordingly.

Regards,
Suresh
https://vconnectit.wordpress.com/
Reply
0 Kudos
fernandomm2
Enthusiast
Enthusiast

I'm looking at /var/log/vmkernel.log but it's kind of verbose and I can't find anything related to the issue. Not even searching by time.

Are there any specific strings that I should search? Or should I be looking at another log file?

Reply
0 Kudos
SureshKumarMuth
Commander
Commander

do you have scratch partition configured. Look for old vmkernel logs from /var/run/log location

Regards,
Suresh
https://vconnectit.wordpress.com/
Reply
0 Kudos