VMware Cloud Community
MohmAly
Contributor
Contributor

[msg.hbacommon.corruptredo] The redo log of 000001.vmdk is corrupted. If the problem persists, discard the redo log.

Error message when I try to start an old guest machine on ESXI 5.1 host

[msg.hbacommon.corruptredo] The redo log of xxx-000001.vmdk is corrupted. If the problem persists, discard the redo log

Below is the file list, I tired to consolidate and remove all snapshots but it failed and the snapshots disappeared/dimmed from the vShpere client so I restored it to its previous status before my consolidation attempt.

What to do now, knowing that removing snapshots/delete all and consolidation did not solve the problem before.

pastedImage_0.png

0 Kudos
3 Replies
a_p_
Leadership
Leadership

I can't tell you for sure what's the root cause of this. However, to ensure that it's not an issue related to a suspend state, you may check whether it helps to delete the .vmss file to force a reset/cold boot.

André

0 Kudos
huw02
Contributor
Contributor

Hello,

I had the same error from ESXi version 6.5 including 6.7u3! What is weird, is that it only happen on my ESXi with a local SATA attached Samsung 840 pro 500GB SSD. On my other ESXi host (also 6.7u3) with MSATA Intel 120GB and a Samsung 850 pro 1TB SSD disk the issue never happens.

I have found a good article about VMFS and SEsparse changes in those ESXi versions. If I understand it right, the bug has been fixed in the update 1 of ESXi 6.7. But for me it was not the case unfortunately.

The only solution that is working for me was to delete the partition table and create a new datastore in the old VMFS5 format (instead of VMFS6).

With this change I can now make snapshot again without any redo log issue.

One thing with this workaround is that I can not have vmdk bigger than 2TB, otherwise SEsparse would be used with VMFS5. And SEsparse is what seems to be related to this issue.

I hope this will help you and also that vmware will re-open this issue that should have been fixed with the update01.

0 Kudos
mikedsz
Contributor
Contributor

Hello Mohmaly,

This appears to be an issue with the snapshot that is present on the VM. Redo log corruptions indicate corruptions on the snapshot and we will have no other option but to revert back to the previous snapshot if any.
Redo log corruptions may occur due to ( but not limited to) :

-- This issue might occur by various circumstances that include but are not limited to:

-- Hardware issues with the storage controller or storage device.

-- Connectivity issues between the ESX host and the storage device.

-- When the datastore containing the snapshot disks runs out of free disk space.

You could try performing a snapshot consolidation or clone operation, but do this make sure you terminate the VM, by clicking cancel in the task entry.
Clone from the command line can be performed using the below steps : VMware Knowledge Base

VMware Knowledge Base

If the clone does not work, then we may have to restore the VM from backup.

The clear understanding of why this may occur is scarce however,

The delta disk metadata in-memory of vSphere host includes the delta disk header. Updates to the header of the delta disks happen in memory as required and the changes are written to disk only upon certain events such as snapshot consolidation or when the delta disk is closed. The corruption of these delta disks due to the reasons mentioned above can cause this redo log corruption issues.

How can we prevent this ?
Tbh, there is no hard and fast rule, but if we are having this issue very often then we may want to check if there are any backend storage disconnections or storage array tasks, backup events occuring. That is eher analysisng logs would be required to get to the root of this.

Mike.

0 Kudos