Highlighted
Contributor
Contributor

Windows 10 Full Clones disk corruption

Hi

Our Windows 10 Full Clones sometimes run into disk corruption. We're on Horizon 7.8 and vSphere 6.7u2 (6.7u2a for vCenter) now, but it's been this way with our VMs since vSphere 6.5 and Horizon 7.1.

VMs sometimes start to scan & repair at boot, but other times they just bluescreen with error codes such as "CRITICAL_PROCESS_DIED". It has happened on our Windows 10 images since build 1511 (we're on 1809 now).

Does anyone have any clue about what this can be caused by?

Environment details:

Horizon 7.8
UAG 3.5
vSphere 6.7u2
EMC XtremIO SAN via FC
Cisco UCS C240 M4 rack / B200 M4 blades

0 Kudos
8 Replies
Highlighted
Enthusiast
Enthusiast

"CRITICAL_PROCESS_DIED" means one of the critical system process like for ex: csrss.exe or svchost.exe would have crashed causing the VM to bluescreen. If you can upload atleast the kernel memory dump of the server, then we would be able to help you root cause this issue.

0 Kudos
Highlighted
Commander
Commander

We've seen this with antivirus or other software that adds a filter driver. Build a vanilla VM using a Windows 10 ISO with only VMware Tools/Horizon Agent and all Microsoft updates to see if you can reproduce it. If not add software slowly to the image until the problem returns.

0 Kudos
Highlighted
Contributor
Contributor

Kishoreg5674

Sorry, I cannot get to the memory dump as the VMs are unrecoverable once this happens.

Another error that keeps happening now is "Critical System Driver missing. File: WindowsTrustedRT.sys Error code: 0xc0000225" BSOD on boot.

0 Kudos
Highlighted
Contributor
Contributor

BenFB

It might be Trend OfficeScan (anti-virus) doing this, but I don't know how we can confirm it.

Right now I have a few desktops booting to a BSOD which states "NTFS_FILE_SYSTEM" and have the C-drive completely missing from the disk manager (which can be seen by booting on a windows ISO and launching diskpart via command line).

vmware.log doesn't tell me much, so I'm not sure how I can gather the proper information to use for troubleshooting..

2019-06-21T11:46:38.530Z| vcpu-1| I125: AHCI-VMM:HBA reset issued on sata0.


2019-06-21T11:46:40.343Z| vcpu-3| I125: Guest: PVSCSI: driver StorPort v1.3.10.0 starts. obj=FFFFC789E7610E00 reg=FFFFF8070E1E05E0

2019-06-21T11:46:40.344Z| vcpu-3| I125: Guest: Driver=pvscsi, Version=1.3.10.0

2019-06-21T11:46:40.377Z| vcpu-1| I125: Guest: PVSCSI: FindAdapter starts (arg='')

2019-06-21T11:46:40.377Z| vcpu-1| I125: Guest: PVSCSI: range[0]: mmio=1 len=32768 s=0xfe900000

2019-06-21T11:46:40.377Z| vcpu-1| I125: Guest: PVSCSI: range[1]: mmio=0 len=0 s=0x0

2019-06-21T11:46:40.377Z| vcpu-1| I125: Guest: PVSCSI: PVSCSIGetDeviceBase returns 0xffffdd00e8db4000

2019-06-21T11:46:40.377Z| vcpu-1| I125: Guest: PVSCSI: FindAdapter: Device returned, max targets: 65, Driver max: 255

2019-06-21T11:46:40.378Z| vcpu-1| I125: Guest: PVSCSI: CONFIG: useMsg=1 MaxQueueDepth=64 ringPages=32 useReqCallThreshold=1

2019-06-21T11:46:40.378Z| vcpu-3| I125: PVSCSI: ReqRing: 1024 entries, eSz=128, 32 pages

2019-06-21T11:46:40.378Z| vcpu-3| I125: PVSCSI: CmpRing: 4096 entries, eSz=32, 32 pages

2019-06-21T11:46:40.378Z| vcpu-3| I125: PVSCSI: MsgRing: 32 entries, eSz=128, 1 pages

2019-06-21T11:46:40.378Z| vcpu-3| I125: PVSCSI: scsi0: switching to sync

2019-06-21T11:46:40.378Z| vcpu-3| I125: PVSCSI: scsi0: init reqCallThresholdCapable 1

2019-06-21T11:46:40.460Z| vcpu-3| W115: WinBSOD: Synthetic MSR[0x40000100] 0x24

2019-06-21T11:46:40.460Z| vcpu-3| W115:

2019-06-21T11:46:40.460Z| vcpu-3| W115: WinBSOD: Synthetic MSR[0x40000101] 0xb400190637

2019-06-21T11:46:40.460Z| vcpu-3| W115:

2019-06-21T11:46:40.460Z| vcpu-3| W115: WinBSOD: Synthetic MSR[0x40000102] 0xffffc789e4fb2d68

2019-06-21T11:46:40.460Z| vcpu-3| W115:

2019-06-21T11:46:40.460Z| vcpu-3| W115: WinBSOD: Synthetic MSR[0x40000103] 0xffffffffc0000102

2019-06-21T11:46:40.460Z| vcpu-3| W115:

2019-06-21T11:46:40.460Z| vcpu-3| W115: WinBSOD: Synthetic MSR[0x40000104] 0x0

2019-06-21T11:46:40.460Z| vcpu-3| W115:

2019-06-21T11:46:40.460Z| vcpu-1| I125: Guest MSR write (0x49: 0x1)

2019-06-21T11:46:40.460Z| vcpu-0| I125: Guest MSR write (0x49: 0x1)

2019-06-21T11:46:40.460Z| vcpu-2| I125: Guest MSR write (0x49: 0x1)


2019-06-21T11:46:47.522Z| vmx| I125: VigorTransportProcessClientPayload: opID=6d08f26a-43-629f seq=1476: Receiving PowerState.InitiatePowerOff request.

2019-06-21T11:46:47.522Z| vmx| I125: Vix: [vmxCommands.c:557]: VMAutomation_InitiatePowerOff. Trying hard powerOff

2019-06-21T11:46:47.522Z| vmx| I125: VigorTransport_ServerSendResponse opID=6d08f26a-43-629f seq=1476: Completed PowerState request with messages.

2019-06-21T11:46:47.522Z| vmx| I125: Stopping VCPU threads...

0 Kudos
Highlighted
Hot Shot
Hot Shot

Would it be possible to temporarily disable Trend to see if that is causing some of the issue? In the past Trend has wreaked havoc on our VDI desktops, but I've heard it's been better lately.

VDI Engineer VCP-DCV, VCP7-DTM, VCAP7-DTM Design
0 Kudos
Highlighted
Commander
Commander

This closely matches what we saw with a different anti-virus product. The corruption is happening in the guest due to the filter driver so you will be unable to see it.

Start by opening a ticket with Trend. I recall another user seeing this and there was a fix. If that's not possible start by removing Trend or ideally building a vanilla VM from a windows ISO with only VMware Tools and the Horizon Agent. If you can't reproduce it add Trend and see if the issue returns.

0 Kudos
Highlighted
Enthusiast
Enthusiast

Maybe you can try disabling view storage accelerator for that full clone pool if it has been enabled.
Had a similar issue with this but HZ 7.7 on vSphere 6.0.

Is the storage accelerator enabled for your pool/vcenter?

Best regards

Andi

0 Kudos
Highlighted
Contributor
Contributor

andiwe79

Yes, we are using View Storage Accelerator. I will try to disable it for our pools, thanks.

Thanks to the rest of you as well, I will follow up with Trend about this issue.

0 Kudos