t1nue
Contributor
Contributor

Corrupt Redo Log, VM lost, what to do?

Jump to solution

As soon as I take a snapshot, or make a backup (with Synology Active Backup for Business), I get a pop-up "The redo log of xxx is corrupted".

This is 100% reproducible, and happens every time for every type of guest that I tried.

The VM is broken and can't be recovered after this pop-up.

I am using the free ESXi, version 6.7.0 Update 3 (Build 14320388).

Host is an Intel NUC8i5BEH with a Samsung SSD for storage. ESXi itself is installed on an USB stick.

How can this be debugged? Given that the Intel NUC is not a supported device, I can't ask VMWare for help.

I can't find much in the system log, but this is what is there:

2019-12-14T11:07:21.901Z cpu1:2101917)VSCSI: 6602: handle 8209(vscsi0:0):Destroying Device for world 2101909 (pendCom 0)

2019-12-14T11:07:21.901Z cpu1:2101917)NetPort: 1580: disabled port 0x2000009

2019-12-14T11:07:22.441Z cpu1:2101917)FDS: 617: Enabling IO coalescing on driver 'deltadisks' device '81031d-Windows-000001-sesparse.vmdk'

2019-12-14T11:07:22.441Z cpu1:2101917)VSCSI: 3810: handle 8215(vscsi0:0):Creating Virtual Device for world 2101909 (FSS handle 11207458) numBlocks=268435456 (bs=512)

2019-12-14T11:07:22.441Z cpu1:2101917)VSCSI: 273: handle 8215(vscsi0:0):Input values: res=0 limit=-2 bw=-1 Shares=1000

2019-12-14T11:07:22.442Z cpu1:2101917)NetPort: 1359: enabled port 0x2000009 with mac 00:0c:29:f7:63:a0

2019-12-14T11:07:28.847Z cpu6:2097184)ScsiDeviceIO: 3435: Cmd(0x459a40c53c00) 0x1a, CmdSN 0x19484 from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2019-12-14T11:07:48.715Z cpu1:2097584)INFO (ne1000): false RX hang detected on vmnic0

2019-12-14T11:10:14.688Z cpu7:2097699)DVFilter: 5963: Checking disconnected filters for timeouts

Tags (1)
0 Kudos
1 Solution

Accepted Solutions
a_p_
Leadership
Leadership

Please don't get me wrong, I'm not trying to finger point. Just trying to get a full understanding.

So please allow me one more question. What type/model of SSD are you using? I'm asking, because in the past some users reported similar issues with 840 models.

André

View solution in original post

0 Kudos
4 Replies
a_p_
Leadership
Leadership

Some questions:

  • Did this worked before, or is it a new setup?
  • Does the setup meet the Synonogy's requirements for the free Hypervisor (e.g. enabled SSH, ...)?
  • Does the snapshot issue also occur with manually created snapshots?
  • Did you already check the Synology Community to find out whether this is a known issue?

André

0 Kudos
t1nue
Contributor
Contributor

Hi André

Thanks for your time. The Synology backup attempt was just the start; I later discovered that a simple snapshot, not involving Synology at all, also corrupts the VM with the same error message.

About your questions:

  • This never worked. I tried several times over the last few months, with multiple re-installs of ESXi. In no setup did this ever work.
  • Yes, after going through all of the required steps I can successfully backup. I can also also restore after the VM got corrupted. The restored VM runs for a few Minutes, but then inadvertently gets corrupted again.
  • Yes. I have a Debian VM with the necessary changes to enable a delta backup. But I also have a Windows 10 VM without any special modifications. This Windows VM was never touched by the Synology Backup; All I did was a manual Snapshot. The Snapshot didn't even complete before the log corruption was displayed.
  • I asked Synology support, but they could not help. After I saw that this also happens by taking a simple snapshot, I did not insist that Synology tries to solve this. It does look an ESXi issue, triggered by a backup by Synology.
0 Kudos
a_p_
Leadership
Leadership

Please don't get me wrong, I'm not trying to finger point. Just trying to get a full understanding.

So please allow me one more question. What type/model of SSD are you using? I'm asking, because in the past some users reported similar issues with 840 models.

André

View solution in original post

0 Kudos
t1nue
Contributor
Contributor

Sure, I understand.

I have two SSDs. One contains the VMs, and the other one the CD images and ESXi log files.

The VMs are indeed on a Samsung SSD 840. The rest is on a Samsung SSD 970.

I guess the next step of debugging would be to create the VMs on the 970 and see if it works there.

Thanks for the hint! Would you maybe have a link that shows issues with the 840?

Edit: Found this: [msg.hbacommon.corruptredo] The redo log of 000001.vmdk is corrupted. If the problem persists, disca...

It looks exactly like my problem. I also use VMFS6 on this SSD. I don't think I'll try the workaround with VMFS5 though; I'll rather use a different disk.

Edit 2: For several hours and multiple backups I have not seen any issue, once I used a different SSD (Samsung 970 Evo Plus).

0 Kudos