Hello,
since the update to ESXi 5.5.0 Build 2456374 some VM crash with BSOD ( ntfs.sys ) when we start to clone the VM for backup.
The VMware Tools are up to date ( typical installation not full ).
Are there any known issues?
Kind regards,
bravobrawi
Hi
We have the exact same issue and have on open incident with VMware.
The symptom is random BSOD when taking a quiesced snapshot, bugcheck 24 (NTFS_FILE_SYSTEM). Typically your backup software will request a quiesced snapshot and this will randomly fail (perhaps 1 out of 6 times) and the VM then crashes. This started right after applying the host updates and upgrading VMware Tools on the guests (to VMware Tools 9.4.11.2400950).
Any resolution? We are having the same issues on VMware 5.5
Hi.
If you use LSI Logic SAS type SCSI controller on vm, then try change to LSI Logic Parallel type.
Regards.
It is failing for us using the LSI Logic Parallel
Sorry, i choosed the wrong forum. This theard should be in the ESXi 5 Forum. Can i change this?
One VM (Windwos 2008R2) crashed last night. The LSI Logic Parallel is the current SCSI Controller Type.
This discussion has been moved from ESXi to ESXi 5 forum
Do you get the same behaviour when you create a quiesced snapshot manually ?
Do you get this problem only when the VM is busy or also when it is idle ?
Do you get this behaviour only with VMs that have a long uptime ?
Do you get any VSS related errors in the Windows system logs ?
I believe we're seeing something similar if not the same.
We've seen it on two VMs so far since our update.
2008 R2 VMs
LSI Logic SAS
2 virtual disks
6 gigs of ram
4 vcpu each
1 vmxnet3 nic
ESXi 5.1.0, 2583090
VMware Tools version 9.0.15, 2560490
Only thing I see in the VM's log at that point are these
2015-04-20T07:12:05.560Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.
2015-04-20T07:12:20.561Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.
2015-04-20T07:12:25.508Z| vmx| I120: Tools: Tools heartbeat timeout.
2015-04-20T07:26:50.556Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.
2015-04-20T07:27:05.556Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.
2015-04-20T07:35:50.559Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.
Any solutions to this issue yet?
Same scenario of Windows 2008 R2 server running on VMware ESXi 5.5.0 Build 2456374 on a HP BL460 Gen 8 server. Backup software started the create virtual machine snapshot and the Windows 2008 R2 VM BSOD with a ntfs.sys error.
VMware tools are current on the VM.
Curious Shill what version of tools are you running?
vmware tools on server is Version 9.4.11, build-2400950
VM Version 8
16 GB Memory
8 CPU (2 virutal socket 4 Cores per socket)
1 nic (E1000)
LSI Logic SAS
Two disk (50GB and 32GB)
We are seeing this too. Hosts are up-to date 5.5 and VMware Tools 9.4.11, 2400950
Seeing it on 2008, 2008 R2 and 2012. Reboots are occurring after quiescing - not every time but with a fair frequency.
-
Tim Munn
same here:
random BSOD (NTFS_FILE_SYSTEM 0x00000024 - nfts.sys ) after quiesced snapshot (Symantec Netbackup)...
Win Server 2008 R2
VM tools: 9.4.11 build 2400950
Paravirtual SCSI
Host: 5.5.0 2403361
Do you see any NTFS and VSS related error messages in the systemlogfs of the guests ?
In the VEEAM Knowledgebase there is a nice article that walks through a detailed VSS troubleshooting - I highly recommend to verify the correct function of VSS inside the guests - especially if the guest will be automatically backed up and runs a database or acts as a fileserver.
We have had NTFS 137 and 24 errors before the crash. We have a case with VMware and Microsoft and are on a call with both. It is very frustrating and we are getting nowhere.
Now its official: VMware KB: Quiescing operations cause a Windows virtual machine to panic with a Stop 24 error on ntf...
// Currently, there is no resolution.
To work around this issue:
I definitely don't think it's limited to 5.5 as I'm seeing the exact same random error on ESXi 5.1 (2583090).
We are experiencing the issue as well. We have a very large VMWare environment. We upgraded to ESXi Build 2456374 the week of March 16th and started experiencing the issue around April 20th when VMWare Tools got updated. Its affecting upwards of 50 to 100 servers with about 5 to 10 servers blue screening each night. I am actively working with Microsoft and have the case elevated to the highest levels. I am in the process of trying to get the VMWare Case escalated as well.
Cisco UCS B200 M3 Blades
ESXi 5.5.0 Build 2456374
Two separate but equal sites both experiencing the same issue
Avamar 7.1.101-14
Windows 2008 R2
VMWare Tools and Hardware are at the latest
Microsoft has provided KB Article https://support.microsoft.com/en-us/kb/2885209 This article has seemed to fix the issue on one of the servers we tested it with.
Looks like the hotfix above does not resolve the issue
DRC0106,
My system is pretty much identical to yours. UCS, Avamar, Windows and 5.5 Version, We have 1-3 VMs BSoD per night. From what I gather from the VM logs, after the snapshot file is attached to SCSI0:1, and after the first write attempt to that snapshot volume is when the BSoD occurs, 0x24 error code pointing to the VSS process as the culprit along with a corrupt file system.
Does it appear that Microsoft HF fixed the issue for you? I am getting no where with VMWARE. They want us to enable a bunch of guest side logging, the only issue is the random nature of the issue it is almost impossible to turn the logging onto everything. It's hit or miss, and mostly miss getting the logs they are asking for.
I talked in detail with the Escalation Engineer that is working the case. Looks like the Microsoft Hotfix does not correct the issue. They are actively working on a resolution and I am now working directly with the EE to do some testing for them. My recommendation is to open a ticket with VMWare and have it tied to PR1437136 so you get updates on the issue.