VMware Cloud Community
JKleck
Contributor
Contributor

ESXi 6.5.0 - Usage of Snapshots causes error with redo log

Hello,

when I try to create a snapshot on a Linux VM and startup the VM, then at any point the VM is shut down with an error in the redo log file. This happens in a timespan between instant and within 30 minutes.

Log entries:

2017-04-10T11:38:31.477Z warning hostd[D140B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/58887468-d5154633-508a-1866da8572e1/Test-Linux/Test-Linux.vmx] Failed to find activation record, event user unknown.

2017-04-10T11:38:31.477Z verbose hostd[D140B70] [Originator@6876 sub=PropertyProvider] RecordOp ASSIGN: latestEvent, ha-eventmgr. Applied change to temp map.

2017-04-10T11:38:31.477Z info hostd[D140B70] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 20239 : Message on Test-Linux on testserver.local in ha-datacenter: The redo log of 'Test-Linux_1-000001.vmdk' is corrupted. If the problem persists, discard the redo log.

The failing VM is random on any Linux VM. Windows VM is not yet tested.

For testing purpose I created a new Linux VM (Debian 😎 with

- 8 CPUs (1 per socket = default setting)

- 2 GB ram (reserve all guest memory)

- 2x 10 GB Thick provisioned, lazily zeroed HDD stored on iSCSI LUN

Host Environment

- Dell PowerEdge R630

- ESXI 6.5.0 (Build 4564106)

- BIOS 2.3.4

- All VM's have reserved guest memory

- iSCSI LUN has enough free space available

- RAM for VM's is enough available

Anyone has a similar behavior of the ESXi software?

Kind Regards,

Jürgen

Tags (2)
Reply
0 Kudos
6 Replies
continuum
Immortal
Immortal

Hi Jürgen
This sounds very strange - can you please attach all the vmware*.logs from the Test-Linux directory to your next reply ?


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Reply
0 Kudos
JKleck
Contributor
Contributor

Hi Continuum,

Attached are the log files and the screenshot of the system when the failure happens.

The problem occurs while the system writes on the hard disk.

I simulated this on a fresh TinyCore Linux system and applying the "dd" command.

Edit: I have tested it on the server-internal HDD and the problem does not seem to appear. Only on the iSCSI storage, but strange enough because it only happens with snapshots. If I don't use snapshots it works perfectly fine.

Reply
0 Kudos
JKleck
Contributor
Contributor

After upgrading to version 6.5.0 (Build 5310538) the problem is solved.

Reply
0 Kudos
CollinChaffin
Enthusiast
Enthusiast

Wow so this horrible bug is also buried in WORKSTATION as well.  I've just spent almost 13 hrs rebuilding over and over my Ubuntu VM both on 16.04 all way to 17.10 just to test and workstation 12.5.5 and 12.5.6.  I did not go back older but perhaps I need to on workstation.

Symptom: 

1.  Power off ubuntu 16.04+ x64 guest

2.  Take snapshot powered off

3.  Power back on guest

4.  Guest boots to various ext4-fs errors, and guest is corrupted and unusable requiring total rebuild

Kind of a HUGE issue, WTH VMware?  I'm not the only one having this issue and it's in at least TWO versions of workstation with multiple linux kernels?!?

Perhaps not many are utilizing snapshots anymore in Workstation?  This needs an emergency fix!

Reply
0 Kudos
pfielding
Contributor
Contributor

I'm also experiencing this issue.  If I take a snapshot, then do some intensive disk work, shortly after I get the message about the redo log corruption and the vm goes down.  Only happens with snapshots.

I just upgraded to 6.5.0U1 hoping to fix (based on this thread), which brought me to build 5969303, but the problem still persists... Smiley Sad

regards,

paul

Reply
0 Kudos
mysticknight
Enthusiast
Enthusiast

We also updated to the latest version 6765664 but are still seeing the same.. it comes pretty quickly.. We disabled VAAI but still its there.. now we are testing some local disks

Reply
0 Kudos