I have a quite strange issue - I need to find out what or who deleted a VMFS volume by first deleting the partitiontable - then recreating it and formatting it
the following entries can be found in hostd.log
2013-08-05T13:23:39.310Z [FFFB4B90 verbose 'Partitionsvc' opID=291C9D06-000000D0] InvokePartedUtil /sbin/partedUtil "getptbl" "/vmfs/devices/disks/naa.6848f690eeeb95001903ddd4183d53ef"
2013-08-05T13:23:39.311Z [FFFB4B90 info 'SysCommandPosix' opID=291C9D06-000000D0] ForkExec(/sbin/partedUtil) 1950716
2013-08-05T13:23:39.364Z [FFFB4B90 verbose 'Partitionsvc' opID=291C9D06-000000D0] Output Stream from partedUtil while getting partitions for /vmfs/devices/disks/naa.6848f690eeeb95001903ddd4183d53ef:
--> gpt
--> 240614 255 63 3865468800
-->
2013-08-05T13:23:39.364Z [FFFB4B90 verbose 'Partitionsvc' opID=291C9D06-000000D0] InvokePartedUtil /sbin/partedUtil "getptbl" "/vmfs/devices/disks/naa.6848f690eeeb95001903ddd4183d53ef"
2013-08-05T13:23:39.365Z [FFFB4B90 info 'SysCommandPosix' opID=291C9D06-000000D0] ForkExec(/sbin/partedUtil) 1950717
2013-08-05T13:23:39.416Z [FFFB4B90 verbose 'Partitionsvc' opID=291C9D06-000000D0] Output Stream from partedUtil while getting partitions for /vmfs/devices/disks/naa.6848f690eeeb95001903ddd4183d53ef:
--> gpt
--> 240614 255 63 3865468800
-->
2013-08-05T13:23:39.416Z [FFFB4B90 verbose 'Partitionsvc' opID=291C9D06-000000D0] InvokePartedUtil /sbin/partedUtil "getUsableSectors" "/vmfs/devices/disks/naa.6848f690eeeb95001903ddd4183d53ef"
2013-08-05T13:23:39.417Z [FFFB4B90 info 'SysCommandPosix' opID=291C9D06-000000D0] ForkExec(/sbin/partedUtil) 1946622
2013-08-05T13:23:39.467Z [FFFB4B90 verbose 'Partitionsvc' opID=291C9D06-000000D0] Output Stream from partedUtil while getting usable sectors for /vmfs/devices/disks/naa.6848f690eeeb95001903ddd4183d53ef:
--> 34 3865468766
-->
2013-08-05T13:23:39.468Z [FFFB4B90 verbose 'Default' opID=291C9D06-000000D0] Available primary : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, ]
2013-08-05T13:23:39.468Z [FFFB4B90 verbose 'Default' opID=291C9D06-000000D0] Setting start sector to 2048 for 1MB alignment
2013-08-05T13:23:39.486Z [64881B90 info 'TaskManager' opID=291C9D06-000000D1] Task Created : haTask-ha-host-vim.host.DatastoreSystem.createVmfsDatastore-104632042
2013-08-05T13:23:39.487Z [64481B90 verbose 'FSVolumeProvider' opID=291C9D06-000000D1] RescanVmfs called
2013-08-05T13:23:39.554Z [64481B90 verbose 'Partitionsvc' opID=291C9D06-000000D1] InvokePartedUtil /sbin/partedUtil "setptbl" "/vmfs/devices/disks/naa.6848f690eeeb95001903ddd4183d53ef" "gpt" "1 2048 3865463909 AA31E02A400F11DB9590000C2911D1B8 0"
2013-08-05T13:23:39.554Z [64C4DB90 verbose 'Hostsvc::DatastoreSystem'] VmfsUpdate: got VMFS message [N11HostdCommon18VmkernelUpdateVmfsE:0x8310ab0] timestamp=3208504497337 specific=0 name= label=
2013-08-05T13:23:39.555Z [64481B90 info 'SysCommandPosix' opID=291C9D06-000000D1] ForkExec(/sbin/partedUtil) 1950719
2013-08-05T13:23:39.572Z [64481B90 verbose 'FSVolumeProvider' opID=291C9D06-000000D1] RescanVmfs called
2013-08-05T13:23:39.630Z [64481B90 info 'Hostsvc::DatastoreSystem' opID=291C9D06-000000D1] Formatting vmfs partition
2013-08-05T13:23:39.630Z [64481B90 verbose 'FSVolumeProvider' opID=291C9D06-000000D1] FormatVmfs called
2013-08-05T13:23:39.631Z [64481B90 verbose 'FSVolumeProvider' opID=291C9D06-000000D1] LookupVmfs: Cannot find VMFS volume with disk name naa.6848f690eeeb95001903ddd4183d53ef and number 1
2013-08-05T13:23:39.631Z [64481B90 verbose 'FSVolumeProvider' opID=291C9D06-000000D1] LookupVmfs: Cannot find VMFS volume with id <unset>, path, <unset>, or name data
2013-08-05T13:23:39.631Z [64481B90 verbose 'FSVolumeProvider' opID=291C9D06-000000D1] LookupNasByName: Cannot find NasVolume with volume name data
2013-08-05T13:23:42.598Z [64840B90 info 'VmkVprobSource'] VmkVprobSource::Post event: (vim.event.EventEx) {
--> dynamicType = <unset>,
--> key = 1693445704,
--> chainId = 1681851588,
--> createdTime = "1970-01-01T00:00:00Z",
--> userName = "",
--> datacenter = (vim.event.DatacenterEventArgument) null,
--> computeResource = (vim.event.ComputeResourceEventArgument) null,
--> host = (vim.event.HostEventArgument) {
--> dynamicType = <unset>,
--> name = "de-insevm12.ri.local",
--> host = 'vim.HostSystem:ha-host',
--> },
--> vm = (vim.event.VmEventArgument) null,
--> ds = (vim.event.DatastoreEventArgument) null,
--> net = (vim.event.NetworkEventArgument) null,
--> dvs = (vim.event.DvsEventArgument) null,
--> fullFormattedMessage = <unset>,
--> changeTag = <unset>,
--> eventTypeId = "esx.audit.vmfs.lvm.device.discovered",
--> severity = <unset>,
--> message = <unset>,
--> objectId = "ha-eventmgr",
--> objectType = "vim.HostSystem",
--> objectName = <unset>,
--> fault = (vmodl.MethodFault) null,
--> }
2013-08-05T13:23:42.600Z [64840B90 info 'ha-eventmgr'] Event 300 : One or more LVM devices have been discovered on this host.
2013-08-05T13:23:42.600Z [65046B90 verbose 'Hostsvc::DatastoreSystem'] VmfsVobUpdate: vmfs VOB: discovery: msg: [N11HostdCommon6VobMsgE:0x64385310] timestamp=3208507543638
2013-08-05T13:23:42.601Z [65046B90 verbose 'Hostsvc::DatastoreSystem'] Automount: scheduling thread NOW
2013-08-05T13:23:43.142Z [64C0CB90 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root
2013-08-05T13:23:56.447Z [64FC2B90 info 'VmkVprobSource'] VmkVprobSource::Post event: (vim.event.EventEx) {
--> dynamicType = <unset>,
--> key = 102397564,
--> chainId = 1681847216,
--> createdTime = "1970-01-01T00:00:00Z",
--> userName = "",
--> datacenter = (vim.event.DatacenterEventArgument) null,
--> computeResource = (vim.event.ComputeResourceEventArgument) null,
--> host = (vim.event.HostEventArgument) {
--> dynamicType = <unset>,
--> name = "de-insevm12.ri.local",
--> host = 'vim.HostSystem:ha-host',
--> },
--> vm = (vim.event.VmEventArgument) null,
--> ds = (vim.event.DatastoreEventArgument) null,
--> net = (vim.event.NetworkEventArgument) null,
--> dvs = (vim.event.DvsEventArgument) null,
--> fullFormattedMessage = <unset>,
--> changeTag = <unset>,
--> eventTypeId = "esx.audit.vmfs.volume.mounted",
--> severity = <unset>,
--> message = <unset>,
--> arguments = (vmodl.KeyAnyValue) [
--> (vmodl.KeyAnyValue) {
--> dynamicType = <unset>,
--> key = "1",
--> value = "[data, 51ffa76c-96d6efb1-76e4-90b11c496b72]",
--> },
--> (vmodl.KeyAnyValue) {
--> dynamicType = <unset>,
--> key = "2",
--> value = "51ffa76b-46dd7eee-714f-90b11c496b72",
--> },
--> (vmodl.KeyAnyValue) {
--> dynamicType = <unset>,
--> key = "3",
--> value = "rw",
--> }
--> ],
--> objectId = "ha-eventmgr",
--> objectType = "vim.HostSystem",
--> objectName = <unset>,
--> fault = (vmodl.MethodFault) null,
--> }
2013-08-05T13:23:56.450Z [64FC2B90 info 'ha-eventmgr'] Event 301 : File system [data, 51ffa76c-96d6efb1-76e4-90b11c496b72] on volume 51ffa76b-46dd7eee-714f-90b11c496b72 has been mounted in rw mode on this host.
2013-08-05T13:23:56.508Z [64481B90 verbose 'FSVolumeProvider' opID=291C9D06-000000D1] AttachVmfsExtent called on /vmfs/volumes/51ffa76c-96d6efb1-76e4-90b11c496b72
2013-08-05T13:23:56.508Z [64481B90 verbose 'FSVolumeProvider' opID=291C9D06-000000D1] Attach extent is NO-OP, it is already the head of volume.
2013-08-05T13:23:56.508Z [64481B90 info 'Hostsvc::DatastoreSystem' opID=291C9D06-000000D1] Added VMFS volume [/vmfs/volumes/51ffa76c-96d6efb1-76e4-90b11c496b72].
2013-08-05T13:23:56.509Z [64481B90 verbose 'Hostsvc::DatastoreSystem' opID=291C9D06-000000D1] ReconcileVMFSDatastores called: refresh = true, rescan = false
2013-08-05T13:23:56.509Z [64481B90 verbose 'FSVolumeProvider' opID=291C9D06-000000D1] RefreshVMFSVolumes called
2013-08-05T13:23:56.540Z [64481B90 verbose 'FSVolumeProvider' opID=291C9D06-000000D1] RefreshVMFSVolumes: refreshed volume, id 51ffa76c-96d6efb1-76e4-90b11c496b72, name data
2013-08-05T13:23:56.540Z [64481B90 verbose 'FSVolumeProvider' opID=291C9D06-000000D1] RefreshOneVmfsVolume on /vmfs/volumes/51ffa76c-96d6efb1-76e4-90b11c496b72
2013-08-05T13:23:56.572Z [64481B90 info 'Hostsvc::Datastore' opID=291C9D06-000000D1] Refresh: VMFS Datastore name was changed from '51ffa76c-96d6efb1-76e4-90b11c496b72' to 'data'.
2013-08-05T13:23:56.572Z [64481B90 verbose 'Hostsvc::DatastoreSystem' opID=291C9D06-000000D1] DiscoverNewDatastores: data created.
2013-08-05T13:23:56.572Z [64481B90 verbose 'Hostsvc::DatastoreSystem' opID=291C9D06-000000D1] Removing issue NoDatastoresConfigured
2013-08-05T13:23:56.572Z [64481B90 verbose 'Hostsvc::DatastoreSystem' opID=291C9D06-000000D1] ReconcileVMFSDatastores: Done discovering new filesystem volumes.
2013-08-05T13:23:56.573Z [64481B90 info 'TaskManager' opID=291C9D06-000000D1] Task Completed : haTask-ha-host-vim.host.DatastoreSystem.createVmfsDatastore-104632042 Status success
2013-08-05T13:23:56.573Z [64C4DB90 verbose 'Hostsvc::DatastoreSystem'] VmfsUpdate: got VMFS message [N11HostdCommon18VmkernelUpdateVmfsE:0x64509668] timestamp=3208504573901 specific=0 name= label=
2013-08-05T13:23:56.573Z [648C2B90 verbose 'Hostsvc::DatastoreSystem'] ReconcileVMFSDatastores called: refresh = true, rescan = false
2013-08-05T13:23:56.573Z [648C2B90 verbose 'FSVolumeProvider'] RefreshVMFSVolumes called
2013-08-05T13:23:56.637Z [648C2B90 verbose 'FSVolumeProvider'] RefreshVMFSVolumes: refreshed volume, id 51ffa76c-96d6efb1-76e4-90b11c496b72, name data
2013-08-05T13:23:56.637Z [648C2B90 verbose 'Hostsvc::Datastore'] SetVolume: Datastore 51ffa76c-96d6efb1-76e4-90b11c496b72 has changed provider volume pointer
2013-08-05T13:23:56.637Z [648C2B90 verbose 'Hostsvc::DatastoreSystem'] ReconcileVMFSDatastores: Done discovering new filesystem volumes.
2013-08-05T13:23:56.638Z [FFF10A90 verbose 'FSVolumeProvider'] AutomountVolumes called
2013-08-05T13:23:56.638Z [64CDEB90 warning 'PropertyProvider' opID=291C9D06-000000D3] _RetrieveContents took 12599905 microseconds to lock vim.host.DatastoreSystem:ha-datastoresystem
2013-08-05T13:23:56.655Z [FFF10A90 info 'Hostsvc::DatastoreSystem'] DatastoreSystemImpl.AutomountVolumes completed
Hi Ulli,
this looks like a regular VMFS datastore creation to me. The logs do not show the deletion, right?
Is this host managed through vCenter? If yes are there any related messages in vxpa.log or in the vCenter event log and task list?
If this was done remotely through a script then you should see related messages like "User xxx@1.2.3.4 logged in as ..." in hostd.log.
You might also look at /var/log/shell.log to see if there were any commands run interactively in an ESXi shell to re-create the datastore.
- Andreas
Hi Andreas
I think I can rule out "normal" operations. - The same user or task or script deleted all VMs on 3 hosts with 6 datastores before resetting the partitiontables, reformatting the vmfs-volumes and resetting the ESXi configs. When he or ite was done 3 hosts rebooted and came up without network config and blank datastores.
On at least one of those hosts random Windows files were written into the section where we would expect VMFS metadata - I found a MFT-mirror in the place where I would expect the .vh.sf
Weird - really weird - I need to find out if the commands were entered manually via ssh or wether it was a cronjob or interaction via some API.
Here is another example:
2013-08-05T12:58:01.192Z [64FC2B90 verbose 'Vmsvc' opID=9042E070-00000053] Released Vm Id: 4.
2013-08-05T12:58:01.193Z [64FC2B90 verbose 'HostsvcPlugin' opID=9042E070-00000053] RemoveEntry '4'
2013-08-05T12:58:01.193Z [64FC2B90 verbose 'HostsvcPlugin' opID=9042E070-00000053] RemoveEntry succeeded
2013-08-05T12:58:01.193Z [64FC2B90 verbose 'ResourcePool ha-root-pool' opID=9042E070-00000053] Removed child 4 from pool
2013-08-05T12:58:01.193Z [64FC2B90 verbose 'HostsvcPlugin' opID=9042E070-00000053] Security domain hostd4 not found
2013-08-05T12:58:01.195Z [64C8FB90 info 'GuestFileTransferImpl'] VmOperationListener: unregister notification received for VM: 4
2013-08-05T12:58:01.195Z [64C8FB90 info 'GuestFileTransferImpl'] VmOperationListener succeeded
2013-08-05T12:58:01.196Z [64C8FB90 info 'Hbrsvc'] Replicator: UnregisterListener triggered for config VM 4
2013-08-05T12:58:01.196Z [64C8FB90 verbose 'Statssvc'] EntityRemovedListener: Deleting stats for entity 4
2013-08-05T12:58:01.210Z [64A58B90 info 'Libs'] Vix: [6796 foundryVMPowerOps.c:973]: FoundryVMPowerStateChangeCallback: /vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx, vmx/execState/val = poweredOff.
2013-08-05T12:58:01.352Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/564d3d8a-33cb-3aeb-1385-5c2b15747102.vmem'.
2013-08-05T12:58:01.352Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/564d3d8a-33cb-3aeb-1385-5c2b15747102.vmem.lck'.
2013-08-05T12:58:01.872Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/gmon.kstats'.
2013-08-05T12:58:01.873Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/gmon.kstats32'.
2013-08-05T12:58:01.873Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/gmon.kstats64'.
2013-08-05T12:58:01.873Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/gmon.monitor'.
2013-08-05T12:58:01.874Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/gmon.monitor32'.
2013-08-05T12:58:01.874Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/gmon.monitor32.1'.
2013-08-05T12:58:01.874Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/gmon.monitor64'.
2013-08-05T12:58:01.874Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/gmon.monitor64.1'.
2013-08-05T12:58:01.875Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/gmon.perf'.
2013-08-05T12:58:01.875Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/gmon.perf_kstats'.
2013-08-05T12:58:01.875Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/gmon.monitor32.0'.
2013-08-05T12:58:01.875Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/gmon.monitor64.0'.
2013-08-05T12:58:01.875Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/status'.
2013-08-05T12:58:01.876Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/vprintproxy.log'.
2013-08-05T12:58:01.876Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/autoinst.flp'.
2013-08-05T12:58:01.876Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/autoinst.iso'.
2013-08-05T12:58:01.876Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/quicklook-cache.png'.
2013-08-05T12:58:01.877Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname-7d38e0c8.vswp'.
2013-08-05T12:58:01.877Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/vmware-stats.log'.
2013-08-05T12:58:01.888Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/vmware.log'.
2013-08-05T12:58:01.890Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/vmware-6.log'.
2013-08-05T12:58:01.909Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/vmware-7.log'.
2013-08-05T12:58:01.911Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/vmware-5.log'.
2013-08-05T12:58:01.912Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/vmware-4.log'.
2013-08-05T12:58:01.914Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/vmware-3.log'.
2013-08-05T12:58:01.915Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/vmware-2.log'.
2013-08-05T12:58:01.916Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmxf'.
2013-08-05T12:58:01.917Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.nvram'.
2013-08-05T12:58:01.921Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmsd'.
2013-08-05T12:58:01.922Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx'.
2013-08-05T12:58:01.932Z [64FC2B90 info 'vm:/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx' opID=9042E070-00000053] Failed to unset VM medatadata: FileIO error: Could not find file : /vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname-aux.xml.tmp.
2013-08-05T12:58:01.934Z [64FC2B90 verbose 'vm:/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx' opID=9042E070-00000053] DeleteVmDirectory: Deleting vm dir (as superuser) '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname'
2013-08-05T12:58:01.934Z [64F81B90 verbose 'Hostsvc::DatastoreSystem'] Datastore-Vdisk refresh: scheduling thread
2013-08-05T12:58:01.934Z [64FC2B90 info 'TaskManager' opID=9042E070-00000053] Task Completed : haTask-4-vim.ManagedEntity.destroy-104631949 Status success
2013-08-05T12:58:01.934Z [64FC2B90 verbose 'vm:/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx' opID=9042E070-00000053] Close Handle called
2013-08-05T12:58:01.935Z [64FC2B90 verbose 'vm:/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx' opID=9042E070-00000053] Shutting down VMDB service...
2013-08-05T12:58:01.935Z [64FC2B90 verbose 'vm:/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx' opID=9042E070-00000053] Unregistering callback...
2013-08-05T12:58:01.935Z [64FC2B90 verbose 'vm:/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx' opID=9042E070-00000053] ...done
2013-08-05T12:58:01.935Z [64FC2B90 verbose 'vm:/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx' opID=9042E070-00000053] Unsubscribed from events.
2013-08-05T12:58:01.935Z [64FC2B90 info 'Libs' opID=9042E070-00000053] Vix: [20705 foundryVM.c:10650]: Error VIX_E_INVALID_ARG in VixVM_CancelOps(): One of the parameters was invalid
2013-08-05T12:58:01.935Z [64FC2B90 verbose 'vm:/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx' opID=9042E070-00000053] Canceled outstanding Foundry operations.
2013-08-05T12:58:01.935Z [64FC2B90 verbose 'vm:/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx' opID=9042E070-00000053] Released VM handle.
2013-08-05T12:58:01.936Z [64FC2B90 verbose 'vm:/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx' opID=9042E070-00000053] Closed VM handle.
2013-08-05T12:58:01.936Z [64FC2B90 info 'ha-eventmgr' opID=9042E070-00000053] Event 272 : Removed replacedname (Backup) on blablabla.local from ha-datacenter
2013-08-05T12:58:01.936Z [64FC2B90 info 'vm:/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx' opID=9042E070-00000053] State Transition (VM_STATE_DELETING -> VM_STATE_GONE)
2013-08-05T12:58:01.936Z [64FC2B90 verbose 'ha-host' opID=9042E070-00000053] ModeMgr::End: op = normal, current = normal, count = 5
2013-08-05T12:58:01.936Z [64FC2B90 verbose 'vm:/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx' opID=9042E070-00000053] Destroy VM complete
2013-08-05T12:58:01.937Z [64FC2B90 info 'vm:/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx'] Virtual machine object cleanup
2013-08-05T12:58:01.937Z [64FC2B90 verbose 'vm:/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx'] Closed VM handle.
pam_per_user: create_subrequest_handle(): doing map lookup for user "root"
pam_per_user: create_subrequest_handle(): creating new subrequest (user="root", service="system-auth-generic")
Accepted password for user root from 10.49.2.58
2013-08-05T12:58:16.889Z [FFF10A90 info 'Vimsvc'] [Auth]: User root
2013-08-05T12:58:16.889Z [FFF10A90 info 'ha-eventmgr'] Event 273 : User root@10.49.2.58 logged in
pam_per_user: create_subrequest_handle(): doing map lookup for user "root"
pam_per_user: create_subrequest_handle(): creating new subrequest (user="root", service="system-auth-generic")
Accepted password for user root from 10.49.2.58
2013-08-05T12:58:17.228Z [64840B90 info 'Vimsvc'] [Auth]: User root
2013-08-05T12:58:17.228Z [64840B90 info 'ha-eventmgr'] Event 274 : User root@10.49.2.58 logged in
2013-08-05T12:58:17.370Z [FFFB4B90 verbose 'Default'] CloseSession called for session id=520a9289-5eb6-acb7-50ab-b1042027528d
2013-08-05T12:58:17.370Z [FFFB4B90 info 'ha-eventmgr'] Event 275 : User root logged out
...
I have not seen a hostd log entry like:
info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/gmon.monitor32'.
before.
This issue is keeping me busy since a few weeks - in that time I have restored. about 100 critical VMs from unreadable vmfs-volumes.
Monday ESXi host number 7, 8 and 9 were affected. It almost always happens following this pattern:
the hosts lose their network config, reboot and come up with empty datastores or unpartitioned volumes.
Good practice for me by the way but the customer is getting sick of it - this is the first time I was able find the IP of the host that seems to be responsible - needless to say that the IP is no longer active.
This sounds like a very interesting criminal case ...
I cannot think of any bug that causes such behavior, but really tend to believe that your client is under attack by a very rogue human or a very malicious and sophisticated and yet unknown software.
Are the hosts using shared (SAN) storage or only local disks? Are they managed by vCenter?
Are there any other products interacting with the hosts or vCenter? vCloud Director, Lab Manager or something?
Have you checked the hosts for unusual crontab entries or suspicious VIB files installed?
Have you considered re-installing all hosts?
How about limiting remote access (https for the API and ssh) through the hosts' firewall to well known and clean workstations?
"...I need to find out if the commands were entered manually via ssh or wether it was a cronjob or interaction via some API..."
If you look at timestamps, I think you can exclude the first option. No one would enter corresponding commands manually so fast. I dare to say even scripted ssh-connection or client-server api connection on local network could not be that fast.
I think someone gained shell access to esxi host (remote or local), uploaded malicious script, and let it run. Check the time of the first suspicious log-entry: if it has some "rounded" time (i.e. 14:25:00.00), it was probably started by cron. If time seems to be random (i.e. 13:23:39.310), script was started manually, while attacker was logged in.
If you still have original datastore, try to analyse it. In our company standard procedure is to grab images of all affected disks before doing anything else. Content of disk-images is then parsed, looking for ascii-strings. That might reveal important evidence, if disk was not zeroed, random-wiped, or encrypted. Also check all other log-files (I hope you collected them with central log collector). Find what esxi-version was running and if it was not the latest one (at the time of incident), what security problems were known and not fixed. I know this all is tedious work, but that's what forensic analysis is...
Do you see anything in the messages logs since that does logs an event for ssh... what do you notice in the vmkernel or vmkwarning during that point of time...Check the vobd log as well...
Just a curious follow-up, Ulli ...
How did this story continue? Were you able to find out what happened (or is still happening?) at this customer?
Were any of the hints given here useful?
Thanks
Andreas