Solved: Corrupt Redo Log, VM lost, what to do? - VMware Technology Network VMTN

VMware Cloud Community

As soon as I take a snapshot, or make a backup (with Synology Active Backup for Business), I get a pop-up "The redo log of xxx is corrupted".

This is 100% reproducible, and happens every time for every type of guest that I tried.

The VM is broken and can't be recovered after this pop-up.

I am using the free ESXi, version 6.7.0 Update 3 (Build 14320388).

Host is an Intel NUC8i5BEH with a Samsung SSD for storage. ESXi itself is installed on an USB stick.

How can this be debugged? Given that the Intel NUC is not a supported device, I can't ask VMWare for help.

I can't find much in the system log, but this is what is there:

2019-12-14T11:07:21.901Z cpu1:2101917)VSCSI: 6602: handle 8209(vscsi0:0):Destroying Device for world 2101909 (pendCom 0)

2019-12-14T11:07:21.901Z cpu1:2101917)NetPort: 1580: disabled port 0x2000009

2019-12-14T11:07:22.441Z cpu1:2101917)FDS: 617: Enabling IO coalescing on driver 'deltadisks' device '81031d-Windows-000001-sesparse.vmdk'

2019-12-14T11:07:22.441Z cpu1:2101917)VSCSI: 3810: handle 8215(vscsi0:0):Creating Virtual Device for world 2101909 (FSS handle 11207458) numBlocks=268435456 (bs=512)

2019-12-14T11:07:22.441Z cpu1:2101917)VSCSI: 273: handle 8215(vscsi0:0):Input values: res=0 limit=-2 bw=-1 Shares=1000

2019-12-14T11:07:22.442Z cpu1:2101917)NetPort: 1359: enabled port 0x2000009 with mac 00:0c:29:f7:63:a0

2019-12-14T11:07:28.847Z cpu6:2097184)ScsiDeviceIO: 3435: Cmd(0x459a40c53c00) 0x1a, CmdSN 0x19484 from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2019-12-14T11:07:48.715Z cpu1:2097584)INFO (ne1000): false RX hang detected on vmnic0

2019-12-14T11:10:14.688Z cpu7:2097699)DVFilter: 5963: Checking disconnected filters for timeouts

1 Solution

Accepted Solutions

Please don't get me wrong, I'm not trying to finger point. Just trying to get a full understanding.

So please allow me one more question. What type/model of SSD are you using? I'm asking, because in the past some users reported similar issues with 840 models.

André

View solution in original post

4 Replies

Some questions:

Did this worked before, or is it a new setup?
Does the setup meet the Synonogy's requirements for the free Hypervisor (e.g. enabled SSH, ...)?
Does the snapshot issue also occur with manually created snapshots?
Did you already check the Synology Community to find out whether this is a known issue?

André

Hi André

Thanks for your time. The Synology backup attempt was just the start; I later discovered that a simple snapshot, not involving Synology at all, also corrupts the VM with the same error message.

About your questions:

This never worked. I tried several times over the last few months, with multiple re-installs of ESXi. In no setup did this ever work.
Yes, after going through all of the required steps I can successfully backup. I can also also restore after the VM got corrupted. The restored VM runs for a few Minutes, but then inadvertently gets corrupted again.
Yes. I have a Debian VM with the necessary changes to enable a delta backup. But I also have a Windows 10 VM without any special modifications. This Windows VM was never touched by the Synology Backup; All I did was a manual Snapshot. The Snapshot didn't even complete before the log corruption was displayed.
I asked Synology support, but they could not help. After I saw that this also happens by taking a simple snapshot, I did not insist that Synology tries to solve this. It does look an ESXi issue, triggered by a backup by Synology.

Please don't get me wrong, I'm not trying to finger point. Just trying to get a full understanding.

So please allow me one more question. What type/model of SSD are you using? I'm asking, because in the past some users reported similar issues with 840 models.

André

Sure, I understand.

I have two SSDs. One contains the VMs, and the other one the CD images and ESXi log files.

The VMs are indeed on a Samsung SSD 840. The rest is on a Samsung SSD 970.

I guess the next step of debugging would be to create the VMs on the 970 and see if it works there.

Thanks for the hint! Would you maybe have a link that shows issues with the 840?

Edit: Found this: [msg.hbacommon.corruptredo] The redo log of 000001.vmdk is corrupted. If the problem persists, disca...

It looks exactly like my problem. I also use VMFS6 on this SSD. I don't think I'll try the workaround with VMFS5 though; I'll rather use a different disk.

Edit 2: For several hours and multiple backups I have not seen any issue, once I used a different SSD (Samsung 970 Evo Plus).