bravobrawi
Contributor
Contributor

Windows 7 and Windows2008R2 VM BSOD ntfs.sys

Hello,

since the update to ESXi 5.5.0 Build 2456374 some VM crash with BSOD ( ntfs.sys ) when we start to clone the VM for backup.

The VMware Tools are up to date ( typical installation not full ).

Are there any known issues?

Kind regards,

bravobrawi

51 Replies
nzjono
Enthusiast
Enthusiast

Hi

We have the exact same issue and have on open incident with VMware.

The symptom is random BSOD when taking a quiesced snapshot, bugcheck 24 (NTFS_FILE_SYSTEM).  Typically your backup software will request a quiesced snapshot and this will randomly fail (perhaps 1 out of 6 times) and the VM then crashes.  This started right after applying the host updates and upgrading VMware Tools on the guests (to VMware Tools 9.4.11.2400950).

0 Kudos
cyasuna
Contributor
Contributor

Any resolution?  We are having the same issues on VMware 5.5

0 Kudos
tabaccopie
Enthusiast
Enthusiast

Hi.

If you use LSI Logic SAS type SCSI controller on vm, then try change to LSI Logic Parallel type.

Regards.

0 Kudos
cyasuna
Contributor
Contributor

It is failing for us using the LSI Logic Parallel

0 Kudos
bravobrawi
Contributor
Contributor

Sorry, i choosed the wrong forum. This theard should be in the ESXi 5 Forum. Can i change this?

One VM (Windwos 2008R2) crashed last night. The LSI Logic Parallel is the current SCSI Controller Type.

0 Kudos
continuum
Immortal
Immortal

This discussion has been moved from ESXi to ESXi 5 forum

Do you get the same behaviour when you create a quiesced snapshot manually ?
Do you get this problem only when the VM is busy or also when it is idle ?
Do you get this behaviour only with VMs that have a long uptime ?
Do you get any VSS related errors in the Windows system logs ?

Do you need support with a recovery problem ? - send a message via skype "sanbarrow"
0 Kudos
julezatmortonbu
Enthusiast
Enthusiast

I believe we're seeing something similar if not the same.

We've seen it on two VMs so far since our update.
2008 R2 VMs

LSI Logic SAS

2 virtual disks

6 gigs of ram

4 vcpu each

1 vmxnet3 nic

ESXi 5.1.0, 2583090

VMware Tools version 9.0.15, 2560490


Only thing I see in the VM's log at that point are these

2015-04-20T07:12:05.560Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.

2015-04-20T07:12:20.561Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.

2015-04-20T07:12:25.508Z| vmx| I120: Tools: Tools heartbeat timeout.

2015-04-20T07:26:50.556Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.

2015-04-20T07:27:05.556Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.

2015-04-20T07:35:50.559Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.

0 Kudos
shill1
Contributor
Contributor

Any solutions to this issue yet?

Same scenario of Windows 2008 R2 server running on VMware ESXi 5.5.0 Build 2456374 on a HP BL460 Gen 8 server.  Backup software started the create virtual machine snapshot and the Windows 2008 R2 VM BSOD with a ntfs.sys error.

VMware tools are current on the VM.

0 Kudos
julezatmortonbu
Enthusiast
Enthusiast

Curious Shill what version of tools are you running?

0 Kudos
shill1
Contributor
Contributor


vmware tools on server is Version 9.4.11, build-2400950

VM Version 8

16 GB Memory

8 CPU (2 virutal socket 4 Cores per socket)

1 nic (E1000)

LSI Logic SAS

Two disk (50GB and 32GB)

0 Kudos
tmunn
Contributor
Contributor

We are seeing this too. Hosts are up-to date 5.5 and VMware Tools 9.4.11, 2400950

Seeing it on 2008, 2008 R2 and 2012. Reboots are occurring after quiescing - not every time but with a fair frequency.

-

Tim Munn

0 Kudos
barmy2k
Contributor
Contributor

same here:

random BSOD (NTFS_FILE_SYSTEM 0x00000024 - nfts.sys ) after quiesced snapshot (Symantec Netbackup)...

Win Server 2008 R2

VM tools: 9.4.11 build 2400950

Paravirtual SCSI

Host: 5.5.0 2403361

0 Kudos
continuum
Immortal
Immortal

Do you see any NTFS and VSS related error messages in the systemlogfs of the guests ?

In the VEEAM Knowledgebase there is a nice article that walks through a detailed VSS troubleshooting - I highly recommend to verify the correct function of VSS inside the guests - especially if the guest will be automatically backed up and runs a database or acts as a fileserver.

Do you need support with a recovery problem ? - send a message via skype "sanbarrow"
0 Kudos
cyasuna
Contributor
Contributor

We have had NTFS 137 and 24 errors before the crash.   We have a case with VMware and Microsoft and are on a call with both.   It is very frustrating and we are getting nowhere.

0 Kudos
barmy2k
Contributor
Contributor

Now its official: VMware KB: Quiescing operations cause a Windows virtual machine to panic with a Stop 24 error on ntf...

// Currently, there is no resolution.

To work around this issue:

  • Disable snapshot quiescing within your backup solution.
  • Do not select Quiesce guest file system when taking a snapshot of a virtual machine from the vSphere Client.
0 Kudos
julezatmortonbu
Enthusiast
Enthusiast

I definitely don't think it's limited to 5.5 as I'm seeing the exact same random error on ESXi 5.1 (2583090).

0 Kudos
drc0106
Contributor
Contributor

We are experiencing the issue as well. We have a very large VMWare environment. We upgraded to ESXi Build 2456374 the week of March 16th and started experiencing the issue around April 20th when VMWare Tools got updated. Its affecting upwards of 50 to 100 servers with about 5 to 10 servers blue screening each night. I am actively working with Microsoft and have the case elevated to the highest levels. I am in the process of trying to get the VMWare Case escalated as well.

Cisco UCS B200 M3 Blades

ESXi 5.5.0 Build 2456374

Two separate but equal sites both experiencing the same issue

Avamar 7.1.101-14

Windows 2008 R2

VMWare Tools and Hardware are at the latest

Microsoft has provided KB Article https://support.microsoft.com/en-us/kb/2885209 This article has seemed to fix the issue on one of the servers we tested it with.

Looks like the hotfix above does not resolve the issue

0 Kudos
thaynes2015
Contributor
Contributor

DRC0106,

My system is pretty much identical to yours. UCS, Avamar, Windows and 5.5 Version, We have 1-3 VMs BSoD per night.  From what I gather from the VM logs, after the snapshot file is attached to SCSI0:1, and after the first write attempt to that snapshot volume is when the BSoD occurs, 0x24 error code pointing to the VSS process as the culprit along with a corrupt file system.

Does it appear that Microsoft HF fixed the issue for you?  I am getting no where with VMWARE.  They want us to enable a bunch of guest side logging, the only issue is the random nature of the issue it is almost impossible to turn the logging onto everything.  It's hit or miss, and mostly miss getting the logs they are asking for.

0 Kudos
drc0106
Contributor
Contributor

I talked in detail with the Escalation Engineer that is working the case. Looks like the Microsoft Hotfix does not correct the issue. They are actively working on a resolution and I am now working directly with the EE to do some testing for them. My recommendation is to open a ticket with VMWare and have it tied to PR1437136 so you get updates on the issue.

0 Kudos