Hi everyone,
is there a possibility to disabling datastore heartbeating on vsphere 5...??
i my lab with 2 esx hosts and a virtual storage appliance (SvSAN) i have the issue that i never get an host isolation response...
i have checked a lot and the isolation address (std. gateway) is not replying but the isolation response doesnt work...
i think this is an issue of datastore heartbeating and therefor i want to disable this feature for tests...
thanks for your answers..
i my lab with 2 esx hosts and a virtual storage appliance (SvSAN) i have the issue that i never get an host isolation response...
Well, isn't that great? No more false positives due to the loss of the Management Network ![]()
What you could do is to either "cut" the connection to the SvSAN like you do with the Management Network or - if you prefer - power off the host.
André
no that isn't great ![]()
my problem is that i have one switch on every site...
when this switch failed, my network connectifity is complettly down...
the problem is the virtual storage..
this appliance provides an iscsi storage device for both esx servers.
every esxserver has one iscsi device with 2 paths (one to every iscsi-appliance)
the problem is, on a split-brain (one switch is down), both esx servers are running and writing data to their local iscsi-appliance...
when the network has been fixed, both iscsi-appliances has different data-states and they crashes...
under vsphere4 i had set the isolation response to shutdown the isolated site (only the iscsi-appliance) and everything was allright...
how can i get this under vsphere5???
best regards
Datastore heartbeating is meant to allow a master to distiguish between a dead and an isolated/partitioned host, so disabling it will have the opposite affect that you want - it will make an isolated host look like a dead host. How are you testing isolation and what is reported by HA?
i have tested the isolation by a ping to my standard-gateway...
this doesnt reply...
the host, on the site with the failed switch, is not the master of my cluster... is this a reason?
When you isolate one of the hosts, what do you see in the vSphere UI? What is the HA state reported for that host?
how can i check the ha-state without vcenter server?
my vcenter server has no connection to my failed site when i poweroff my switch...
to disable datastore heartbeats:
Edit Settings on the Cluster>Datastore Heartbeating>Select only from my preffered datastore
(heres the clever bit) dont select any preffered datastore - then click ok.
you can verify its not using it by going back into the edit settings of the cluster >Datastore Heartbeating>cluster Status Dialog>Heartbeat Datastore Tab - you should see that none are listed.
regards
Gav0
So vCenter has no connection to either host in the cluster? When you fail the switch are you isolating both hosts in the cluster or just one of them? You can access the HA agent directly using the mob (managed object browser) on a regular internet browser - just go to http://<host_ip>/mobfdm (see http://www.virtuallyghetto.com/2011/07/theres-new-mob-in-town-fdm-mob-for-esxi.html)
when i fail my switch, i loose only connection from vcenter to the esx on this site where the switch has failed...
the other esx has connection to vcenter...
my "network":
ESX1 (on Site 1) <--> SWITCH1 (on Site 1) <--> SWITCH2 (on Site 2) <--> ESX2 (on Site 2)
i will check the state now over this mobfdm and then reply here....
The HA state of a host is (usually) reported by the master, not by the host itself. So even if vCenter cannot reach the host its HA state may still be valid. What is vCenter reporting as the HA state for the host you are isolating?
so i have checked these things...
vcenter marks the host on the failed site as "Host Failed"
on this host, the webpage /mobfdm/ has cluster-state "Startup"
i have also checked from the failed site a ping to my standard gateway... this also failes... in my oppinion the host should do his isolation response??!
The fdm will be in "startup" state if it is isolated. To confirm, check the fdm logs (/var/run/log/fdm*) for a message like "Am isolated! Dropping to STARTUP"
2012-03-20T14:30:22.647Z [46259B90 verbose 'Election' opID=SWI-a9b02dbc] [ClusterElection::MasterStateFunc] Am isolated! Dropping to STARTUP!
2012-03-20T14:30:22.647Z [46259B90 warning 'Election' opID=SWI-a9b02dbc] Election error
2012-03-20T14:30:22.647Z [46259B90 info 'Election' opID=SWI-a9b02dbc] [ClusterElection::ChangeState] Master => Startup : Election error
2012-03-20T14:30:22.648Z [46259B90 info 'Cluster' opID=SWI-a9b02dbc] Change state to Startup:0
2012-03-20T14:30:22.648Z [462DBB90 verbose 'Cluster' opID=SWI-b69e6085] [ClusterManagerImpl::CheckElectionState] Transitioned from Master to Startup
2012-03-20T14:30:22.648Z [462DBB90 info 'Invt' opID=SWI-b69e6085] [InventoryManagerImpl::NotifyDatastoreUnlockedLocally] Invoked for datastore (/vmfs/volumes/4f1fe1be-5c192c43-c0c9-0025b30272a2).
2012-03-20T14:30:22.648Z [462DBB90 info 'Invt' opID=SWI-b69e6085] [InventoryManagerImpl::NotifyDatastoreUnlockedLocally] Invoked for datastore (/vmfs/volumes/4f0d8e48-cc6fcf66-a20a-68b599722822).
2012-03-20T14:30:22.648Z [462DBB90 info 'Cluster' opID=SWI-b69e6085] [ClusterManagerImpl::MainLoop] curState 1 lastState 6
2012-03-20T14:30:22.649Z [462DBB90 info 'Cluster' opID=SWI-b69e6085] [ClusterManagerImpl::AddDatastore] path=/vmfs/volumes/4f0d8e48-cc6fcf66-a20a-68b599722822 mountHost=host-21 type=2 accessible=true
2012-03-20T14:30:22.649Z [4631CB90 info 'Invt' opID=SWI-e2d2ba9b] [InventoryManagerImpl::ProcessClusterChange] Cluster state changed to Startup
2012-03-20T14:30:22.649Z [462DBB90 info 'Cluster' opID=SWI-b69e6085] [ClusterManagerImpl::AddDatastore] path=/vmfs/volumes/4f1fe1be-5c192c43-c0c9-0025b30272a2 mountHost=host-21 type=2 accessible=true
2012-03-20T14:30:22.650Z [4631CB90 verbose 'PropertyProvider' opID=SWI-e2d2ba9b] RecordOp ASSIGN: clusterState, fdmService
2012-03-20T14:30:22.650Z [4631CB90 verbose 'FDM' opID=SWI-e2d2ba9b] [FdmService::Handle::ClusterStateNotification] Cluster state changed: Master -> Startup
Ok, so the host detected that it was isolated though the other host (the master) did not detect this, presumably because of some problem accessing the heartbeat datastore. When a host is isolated, it writes this info to a file on the heartbeat datastore and the master reads that file. If either host cannot acces the heartbeat datastore the master will declare the host dead rather than isolated. Are you sure the isolated host can access the heartbeat datastore while it is isolated?
no he cant... thats what i said...
the isolated host cant access the same datastore as the master when the network-switch failes...
i will disabled datastore-heartbeating like posted by Gav0 and check it again...
If the isolated host cannot access the heartbeat datastore then the master cannot detect that the host is isolated and will immediately try to failover any vms that were on the host (which is causing your data corruption issues). If you disable datastore heartbeating, it won't make any difference - you'll see the same thing. If you want to ensure that the master does not failover vms in case of isolation you need to keep datastore heartbeating enabled, and use a a heartbeat datastore will remain accessible even when the management network is down.
so i had done another test...
i have disabled datastore heartbeating and reinstalled the ha-agents on the hosts...
now the logs are a little bit better:
2012-03-20T15:03:33.815Z [461D7B90 info 'Election' opID=SWI-ba420ecc] Slave timed out
2012-03-20T15:03:33.816Z [461D7B90 info 'Election' opID=SWI-ba420ecc] [ClusterElection::ChangeState] Slave => Startup : Lost master
2012-03-20T15:03:33.816Z [461D7B90 info 'Cluster' opID=SWI-ba420ecc] Change state to Startup:0
2012-03-20T15:03:33.816Z [4631CB90 verbose 'Cluster' opID=SWI-3b1a98a4] [ClusterManagerImpl::CheckElectionState] Transitioned from Slave to Startup
2012-03-20T15:03:33.817Z [4631CB90 info 'Invt' opID=SWI-3b1a98a4] [InventoryManagerImpl::NotifyDatastoreUnlockedLocally] Invoked for datastore (/vmfs/volumes/4f1fe1be-5c192c43-c0c9-0025b30272a2).
2012-03-20T15:03:33.817Z [4631CB90 info 'Cluster' opID=SWI-3b1a98a4] Releasing datastore /vmfs/volumes/4f0d8e48-cc6fcf66-a20a-68b599722822
2012-03-20T15:03:33.818Z [4631CB90 error 'Message' opID=SWI-3b1a98a4] AsyncWrite: Error sending asynch message: N7Vmacore21InvalidStateExceptionE(Invalid state)
2012-03-20T15:03:33.818Z [4631CB90 warning 'Cluster' opID=SWI-3b1a98a4] [ClusterDatastore::SendDatastoreReleasedMsg] Exception sending to host-22: N7Vmacore21InvalidStateExceptionE(Invalid state)
2012-03-20T15:03:33.818Z [4631CB90 info 'Message' opID=SWI-3b1a98a4] Destroying connection
2012-03-20T15:03:33.818Z [4631CB90 info 'Invt' opID=SWI-3b1a98a4] [InventoryManagerImpl::NotifyDatastoreUnlockedLocally] Invoked for datastore (/vmfs/volumes/4f0d8e48-cc6fcf66-a20a-68b599722822).
2012-03-20T15:03:33.819Z [4631CB90 info 'Cluster' opID=SWI-3b1a98a4] [ClusterManagerImpl::MainLoop] curState 1 lastState 4
2012-03-20T15:03:33.819Z [4639EB90 info 'Invt' opID=SWI-e762e053] [InventoryManagerImpl::ProcessClusterChange] Cluster state changed to Startup
2012-03-20T15:03:33.820Z [4639EB90 verbose 'PropertyProvider' opID=SWI-e762e053] RecordOp ASSIGN: clusterState, fdmService
2012-03-20T15:03:33.820Z [4639EB90 verbose 'FDM' opID=SWI-e762e053] [FdmService::Handle::ClusterStateNotification] Cluster state changed: Slave -> Startup
2012-03-20T15:03:33.820Z [4639EB90 verbose 'Placement' opID=SWI-e762e053] [PlacementManagerImpl::ClusterStateListener::Handle] New cluster state is 1
2012-03-20T15:03:33.820Z [4639EB90 verbose 'Execution' opID=SWI-e762e053] [ExecutionManagerImpl::ClusterStateListener::Handle] New cluster state is 1
2012-03-20T15:03:33.820Z [4639EB90 verbose 'Policy' opID=SWI-e762e053] [PolicyManager::Handle(ClusterStateNotification)] Transitioning to startup (1). Disabling global policy and enabling local policy.
2012-03-20T15:03:33.820Z [4639EB90 verbose 'Monitor' opID=SWI-e762e053] [IsoAddressMonitor::Handle::ClusterStateNotification] Cluster state changed to 1
2012-03-20T15:03:33.820Z [4639EB90 verbose 'Monitor' opID=SWI-e762e053] [PingableAddressMonitor::Handle::ClusterStateNotification] Cluster state changed to 1
2012-03-20T15:03:33.820Z [4639EB90 verbose 'Monitor' opID=SWI-e762e053] [HostAccessMonitor::ClusterStateListener] Cluster state changed to 1
2012-03-20T15:03:38.818Z [461D7B90 info 'Election' opID=SWI-ba420ecc] [ClusterElection::ChangeState] Startup => Candidate : Startup Timeout
2012-03-20T15:03:38.818Z [461D7B90 info 'Cluster' opID=SWI-ba420ecc] Change state to Candidate:690781799707
2012-03-20T15:03:38.819Z [4631CB90 verbose 'Cluster' opID=SWI-3b1a98a4] [ClusterManagerImpl::CheckElectionState] Transitioned from Startup to Candidate
2012-03-20T15:03:38.819Z [4631CB90 info 'Invt' opID=SWI-3b1a98a4] [InventoryManagerImpl::NotifyDatastoreUnlockedLocally] Invoked for datastore (/vmfs/volumes/4f1fe1be-5c192c43-c0c9-0025b30272a2).
2012-03-20T15:03:38.819Z [4631CB90 info 'Invt' opID=SWI-3b1a98a4] [InventoryManagerImpl::NotifyDatastoreUnlockedLocally] Invoked for datastore (/vmfs/volumes/4f0d8e48-cc6fcf66-a20a-68b599722822).
2012-03-20T15:03:38.819Z [4631CB90 info 'Cluster' opID=SWI-3b1a98a4] [ClusterManagerImpl::MainLoop] curState 5 lastState 1
2012-03-20T15:03:44.732Z [FFF72B90 warning 'Libs'] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
2012-03-20T15:03:44.733Z [FFF72B90 warning 'Libs'] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
2012-03-20T15:03:44.733Z [FFF72B90 warning 'Libs'] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
2012-03-20T15:03:44.785Z [FFE0B400 verbose 'HttpConnectionPool'] HttpConnectionPoolImpl created. maxPoolConnections = 1; idleTimeout = 900000000; maxOpenConnections = 1; maxConnectionAge = 0
2012-03-20T15:03:44.791Z [FFF31B90 warning 'Libs'] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
2012-03-20T15:03:44.791Z [FFF31B90 warning 'Libs'] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
2012-03-20T15:03:44.791Z [FFF31B90 warning 'Libs'] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
2012-03-20T15:03:44.825Z [FFE0B400 info 'vmomi.soapStub[4]'] Resetting stub adapter for server TCP:localhost:443 : Closed
2012-03-20T15:03:44.826Z [FFE0B400 verbose 'HalCnx'] [HalCnx] Authenticate succeeded: userName=root
2012-03-20T15:03:44.827Z [FFE0B400 verbose 'DebugBrowser.HTTPService'] User agent is 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)'
2012-03-20T15:03:44.827Z [FFE0B400 verbose 'DebugBrowser'] MO fdmService: typeName: ManagedObjectReference:CsiFdmService, homeUrl: /mobfdm
2012-03-20T15:03:44.828Z [FFE0B400 verbose 'DebugBrowser'] MO fdmLogServiceManager: typeName: ManagedObjectReference:CsiLogsvcManager, homeUrl: /mobfdm
2012-03-20T15:03:44.829Z [FFE0B400 verbose 'DebugBrowser.HTTPService'] HTTP Response: Complete (processed 11766 bytes)
2012-03-20T15:03:45.709Z [46218B90 warning 'Libs'] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
2012-03-20T15:03:45.710Z [46218B90 warning 'Libs'] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
2012-03-20T15:03:45.710Z [46218B90 warning 'Libs'] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
2012-03-20T15:03:45.761Z [4629AB90 verbose 'HttpConnectionPool'] HttpConnectionPoolImpl created. maxPoolConnections = 1; idleTimeout = 900000000; maxOpenConnections = 1; maxConnectionAge = 0
2012-03-20T15:03:45.766Z [4639EB90 warning 'Libs'] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
2012-03-20T15:03:45.767Z [4639EB90 warning 'Libs'] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
2012-03-20T15:03:45.767Z [4639EB90 warning 'Libs'] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
2012-03-20T15:03:45.800Z [4629AB90 info 'vmomi.soapStub[5]'] Resetting stub adapter for server TCP:localhost:443 : Closed
2012-03-20T15:03:45.802Z [4629AB90 verbose 'HalCnx'] [HalCnx] Authenticate succeeded: userName=root
2012-03-20T15:03:45.802Z [4629AB90 verbose 'DebugBrowser.HTTPService'] User agent is 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)'
2012-03-20T15:03:45.802Z [4629AB90 verbose 'DebugBrowser'] MO fdmService: typeName: ManagedObjectReference:CsiFdmService, homeUrl: /mobfdm
2012-03-20T15:03:45.804Z [4629AB90 verbose 'DebugBrowser'] MO fdmLogServiceManager: typeName: ManagedObjectReference:CsiLogsvcManager, homeUrl: /mobfdm
2012-03-20T15:03:45.804Z [4629AB90 verbose 'DebugBrowser.HTTPService'] HTTP Response: Complete (processed 11766 bytes)
2012-03-20T15:03:46.547Z [FFE0B400 warning 'Libs'] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
2012-03-20T15:03:46.547Z [FFE0B400 warning 'Libs'] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
2012-03-20T15:03:46.548Z [FFE0B400 warning 'Libs'] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
2012-03-20T15:03:46.600Z [FFEF0B90 verbose 'HttpConnectionPool'] HttpConnectionPoolImpl created. maxPoolConnections = 1; idleTimeout = 900000000; maxOpenConnections = 1; maxConnectionAge = 0
2012-03-20T15:03:46.608Z [462DBB90 warning 'Libs'] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
2012-03-20T15:03:46.608Z [462DBB90 warning 'Libs'] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
2012-03-20T15:03:46.608Z [462DBB90 warning 'Libs'] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
2012-03-20T15:03:46.634Z [FFEF0B90 info 'vmomi.soapStub[6]'] Resetting stub adapter for server TCP:localhost:443 : Closed
2012-03-20T15:03:46.636Z [FFEF0B90 verbose 'HalCnx'] [HalCnx] Authenticate succeeded: userName=root
2012-03-20T15:03:46.636Z [FFEF0B90 verbose 'DebugBrowser.HTTPService'] User agent is 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)'
2012-03-20T15:03:46.636Z [FFEF0B90 verbose 'DebugBrowser'] MO fdmService: typeName: ManagedObjectReference:CsiFdmService, homeUrl: /mobfdm
2012-03-20T15:03:46.638Z [FFEF0B90 verbose 'DebugBrowser'] MO fdmLogServiceManager: typeName: ManagedObjectReference:CsiLogsvcManager, homeUrl: /mobfdm
2012-03-20T15:03:46.638Z [FFEF0B90 verbose 'DebugBrowser.HTTPService'] HTTP Response: Complete (processed 11766 bytes)
2012-03-20T15:03:47.347Z [46155B90 warning 'Libs'] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
2012-03-20T15:03:47.348Z [46155B90 warning 'Libs'] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
2012-03-20T15:03:47.348Z [46155B90 warning 'Libs'] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
2012-03-20T15:03:47.401Z [4639EB90 verbose 'HttpConnectionPool'] HttpConnectionPoolImpl created. maxPoolConnections = 1; idleTimeout = 900000000; maxOpenConnections = 1; maxConnectionAge = 0
2012-03-20T15:03:47.406Z [FFFF4B90 warning 'Libs'] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
2012-03-20T15:03:47.406Z [FFFF4B90 warning 'Libs'] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
2012-03-20T15:03:47.407Z [FFFF4B90 warning 'Libs'] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
2012-03-20T15:03:47.441Z [4639EB90 info 'vmomi.soapStub[7]'] Resetting stub adapter for server TCP:localhost:443 : Closed
2012-03-20T15:03:47.442Z [4639EB90 verbose 'HalCnx'] [HalCnx] Authenticate succeeded: userName=root
2012-03-20T15:03:47.443Z [4639EB90 verbose 'DebugBrowser.HTTPService'] User agent is 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)'
2012-03-20T15:03:47.443Z [4639EB90 verbose 'DebugBrowser'] MO fdmService: typeName: ManagedObjectReference:CsiFdmService, homeUrl: /mobfdm
2012-03-20T15:03:47.444Z [4639EB90 verbose 'DebugBrowser'] MO fdmLogServiceManager: typeName: ManagedObjectReference:CsiLogsvcManager, homeUrl: /mobfdm
2012-03-20T15:03:47.445Z [4639EB90 verbose 'DebugBrowser.HTTPService'] HTTP Response: Complete (processed 11766 bytes)
2012-03-20T15:03:48.819Z [461D7B90 info 'Election' opID=SWI-ba420ecc] [ClusterElection::ChangeState] Candidate => Master : Master selected
2012-03-20T15:03:48.819Z [461D7B90 info 'Cluster' opID=SWI-ba420ecc] Change state to Master:690781799707
2012-03-20T15:03:48.820Z [4631CB90 verbose 'Cluster' opID=SWI-3b1a98a4] [ClusterManagerImpl::CheckElectionState] Transitioned from Candidate to Master
2012-03-20T15:03:48.820Z [4631CB90 info 'Invt' opID=SWI-3b1a98a4] [InventoryManagerImpl::NotifyDatastoreUnlockedLocally] Invoked for datastore (/vmfs/volumes/4f1fe1be-5c192c43-c0c9-0025b30272a2).
2012-03-20T15:03:48.820Z [4631CB90 info 'Invt' opID=SWI-3b1a98a4] [InventoryManagerImpl::NotifyDatastoreUnlockedLocally] Invoked for datastore (/vmfs/volumes/4f0d8e48-cc6fcf66-a20a-68b599722822).
2012-03-20T15:03:48.820Z [4631CB90 info 'Cluster' opID=SWI-3b1a98a4] [ClusterManagerImpl::MainLoop] curState 6 lastState 5
2012-03-20T15:03:48.821Z [4631CB90 info 'Cluster' opID=SWI-3b1a98a4] [ClusterManagerImpl::MainLoop] Am now master
2012-03-20T15:03:48.821Z [FFFB3B90 info 'Invt' opID=SWI-c4db588c] [InventoryManagerImpl::ProcessClusterChange] Cluster state changed to Master
2012-03-20T15:03:48.821Z [4631CB90 verbose 'Cluster' opID=SWI-3b1a98a4] [ClusterManagerImpl::CheckHostNetworkIsolation] Waited 5 seconds for isolation icmp ping reply. Isolated
2012-03-20T15:03:48.821Z [FFFB3B90 verbose 'PropertyProvider' opID=SWI-c4db588c] RecordOp ASSIGN: clusterState, fdmService
2012-03-20T15:03:48.822Z [4631CB90 info 'Policy' opID=SWI-3b1a98a4] [LocalIsolationPolicy::Handle(IsolationNotification)] host isolated is true
2012-03-20T15:03:48.822Z [FFFB3B90 verbose 'FDM' opID=SWI-c4db588c] [FdmService::Handle::ClusterStateNotification] Cluster state changed: Startup -> Master
2012-03-20T15:03:48.822Z [FFFB3B90 verbose 'Notifications' opID=SWI-c4db588c] [Notification::AddListener] Adding listener of type Csi::Notifications::HostStateChange: FdmService (listeners = 2)
2012-03-20T15:03:48.822Z [FFFB3B90 verbose 'Notifications' opID=SWI-c4db588c] [Notification::AddListener] Adding listener of type Csi::Notifications::VmStateChange: FdmService (listeners = 2)
2012-03-20T15:03:48.822Z [FFFB3B90 verbose 'Notifications' opID=SWI-c4db588c] [Notification::AddListener] Adding listener of type Csi::Notifications::DsStateChange: FdmService (listeners = 2)
2012-03-20T15:03:48.823Z [FFFB3B90 verbose 'Notifications' opID=SWI-c4db588c] [Notification::AddListener] Adding listener of type Csi::Notifications::InitialProtectedList: FdmService (listeners = 1)
2012-03-20T15:03:48.822Z [FFF31B90 verbose 'Policy' opID=SWI-832b39dc] [LocalIsolationPolicy::GetIsolationResponseInfo] Cluster default isolation response is shutdown
2012-03-20T15:03:48.823Z [FFFB3B90 verbose 'Placement' opID=SWI-c4db588c] [PlacementManagerImpl::ClusterStateListener::Handle] New cluster state is 3
2012-03-20T15:03:48.823Z [FFFB3B90 verbose 'Execution' opID=SWI-c4db588c] [ExecutionManagerImpl::ClusterStateListener::Handle] New cluster state is 3
2012-03-20T15:03:48.823Z [FFF31B90 verbose 'Policy' opID=SWI-832b39dc] [LocalIsolationPolicy::LogIsolationResponseInfo] Logging IsolationResponseInfo
2012-03-20T15:03:48.824Z [FFF31B90 verbose 'Policy' opID=SWI-832b39dc] [LocalIsolationPolicy::LogIsolationResponseInfo] Datastore /vmfs/volumes/4f0d8e48-cc6fcf66-a20a-68b599722822
2012-03-20T15:03:48.824Z [FFF31B90 verbose 'Policy' opID=SWI-832b39dc] [LocalIsolationPolicy::LogIsolationResponseInfo] VM /vmfs/volumes/4f0d8e48-cc6fcf66-a20a-68b599722822/SvSAN ESX1/SvSAN ESX1.vmx isolation response shutdown
2012-03-20T15:03:48.824Z [FFFB3B90 verbose 'Policy' opID=SWI-c4db588c] [PolicyManager::Handle(ClusterStateNotification)] Transitioning to master (3). Enabling global and local policies.
2012-03-20T15:03:48.824Z [FFFB3B90 verbose 'Policy' opID=SWI-c4db588c] [GlobalPolicy::OnEnable] Enabling global failure processing policy.
2012-03-20T15:03:48.824Z [FFFB3B90 verbose 'Notifications' opID=SWI-c4db588c] [Notification::AddListener] Adding listener of type Csi::Notifications::VmStateChange: Csi::Policies::GlobalPolicy (listeners = 3)
2012-03-20T15:03:48.825Z [FFFB3B90 verbose 'Monitor' opID=SWI-c4db588c] [IsoAddressMonitor::Handle::ClusterStateNotification] Cluster state changed to 3
2012-03-20T15:03:48.825Z [FFFB3B90 verbose 'Monitor' opID=SWI-c4db588c] [PingableAddressMonitor::Handle::ClusterStateNotification] Cluster state changed to 3
2012-03-20T15:03:48.825Z [FFFB3B90 verbose 'Monitor' opID=SWI-c4db588c] [HostAccessMonitor::ClusterStateListener] Cluster state changed to 3
2012-03-20T15:03:48.827Z [46155B90 verbose 'Placement' opID=SWI-1b0b4dc2] [PlacementManagerImpl::CancelVmPlacement] removed 0 of 8 vms
2012-03-20T15:03:49.819Z [461D7B90 verbose 'Election' opID=SWI-ba420ecc] [ClusterElection::MasterStateFunc] Am isolated! Dropping to STARTUP!
as we can see on 15:03:48.821 --> the host wants to ping the isolation address and get no reply.
he is isolated...
on 15:03:48.824 everything seen to be fine, he wants to shutdown the "SvSAN ESX1.vmx" virtual maschine...
but he dont do that... why???
everything would be allright when he would shutdown this VM...
Check all the log messages with "LocalIsolationPolicy" in them - any indication if it tries to shutdown the vm? Did you try configure the isolation response to "powerOff" rather then "shutdown"?
i had done another tests..
i have a couple of questions:
first my log-entries:
2012-03-23T12:10:50.273Z [46218B90 info 'Cluster' opID=SWI-be5b253c] [ClusterManagerImpl::MainLoop] Am now master
2012-03-23T12:10:50.273Z [46259B90 info 'Invt' opID=SWI-8ae9d7be] [InventoryManagerImpl::ProcessClusterChange] Cluster state changed to Master
2012-03-23T12:10:50.273Z [46218B90 verbose 'Cluster' opID=SWI-be5b253c] [ClusterManagerImpl::CheckHostNetworkIsolation] Waited 5 seconds for isolation icmp ping reply. Isolated
2012-03-23T12:10:50.273Z [46259B90 verbose 'PropertyProvider' opID=SWI-8ae9d7be] RecordOp ASSIGN: clusterState, fdmService
2012-03-23T12:10:50.274Z [46218B90 info 'Policy' opID=SWI-be5b253c] [LocalIsolationPolicy::Handle(IsolationNotification)] host isolated is true
2012-03-23T12:10:50.274Z [46259B90 verbose 'FDM' opID=SWI-8ae9d7be] [FdmService::Handle::ClusterStateNotification] Cluster state changed: Startup -> Master
2012-03-23T12:10:50.274Z [46259B90 verbose 'Notifications' opID=SWI-8ae9d7be] [Notification::AddListener] Adding listener of type Csi::Notifications::HostStateChange: FdmService (listeners = 2)
2012-03-23T12:10:50.274Z [46259B90 verbose 'Notifications' opID=SWI-8ae9d7be] [Notification::AddListener] Adding listener of type Csi::Notifications::VmStateChange: FdmService (listeners = 2)
2012-03-23T12:10:50.274Z [46259B90 verbose 'Notifications' opID=SWI-8ae9d7be] [Notification::AddListener] Adding listener of type Csi::Notifications::DsStateChange: FdmService (listeners = 2)
2012-03-23T12:10:50.274Z [FFEF0B90 verbose 'Policy' opID=SWI-796ce82b] [LocalIsolationPolicy::GetIsolationResponseInfo] Cluster default isolation response is shutdown
2012-03-23T12:10:50.274Z [46259B90 verbose 'Notifications' opID=SWI-8ae9d7be] [Notification::AddListener] Adding listener of type Csi::Notifications::InitialProtectedList: FdmService (listeners = 1)
2012-03-23T12:10:50.275Z [46259B90 verbose 'Placement' opID=SWI-8ae9d7be] [PlacementManagerImpl::ClusterStateListener::Handle] New cluster state is 3
2012-03-23T12:10:50.275Z [FFEF0B90 verbose 'Policy' opID=SWI-796ce82b] [LocalIsolationPolicy::LogIsolationResponseInfo] Logging IsolationResponseInfo
2012-03-23T12:10:50.276Z [FFEF0B90 verbose 'Policy' opID=SWI-796ce82b] [LocalIsolationPolicy::LogIsolationResponseInfo] Datastore /vmfs/volumes/4f0d8e48-cc6fcf66-a20a-68b599722822
2012-03-23T12:10:50.276Z [FFEF0B90 verbose 'Policy' opID=SWI-796ce82b] [LocalIsolationPolicy::LogIsolationResponseInfo] VM /vmfs/volumes/4f0d8e48-cc6fcf66-a20a-68b599722822/SvSAN ESX1/SvSAN ESX1.vmx isolation response powerOff
2012-03-23T12:10:50.276Z [46259B90 verbose 'Execution' opID=SWI-8ae9d7be] [ExecutionManagerImpl::ClusterStateListener::Handle] New cluster state is 3
2012-03-23T12:10:50.276Z [46259B90 verbose 'Policy' opID=SWI-8ae9d7be] [PolicyManager::Handle(ClusterStateNotification)] Transitioning to master (3). Enabling global and local policies.
2012-03-23T12:10:50.276Z [46259B90 verbose 'Policy' opID=SWI-8ae9d7be] [GlobalPolicy::OnEnable] Enabling global failure processing policy.
2012-03-23T12:10:50.276Z [46259B90 verbose 'Notifications' opID=SWI-8ae9d7be] [Notification::AddListener] Adding listener of type Csi::Notifications::VmStateChange: Csi::Policies::GlobalPolicy (listeners = 3)
2012-03-23T12:10:50.276Z [46259B90 verbose 'Monitor' opID=SWI-8ae9d7be] [IsoAddressMonitor::Handle::ClusterStateNotification] Cluster state changed to 3
2012-03-23T12:10:50.277Z [46259B90 verbose 'Monitor' opID=SWI-8ae9d7be] [PingableAddressMonitor::Handle::ClusterStateNotification] Cluster state changed to 3
2012-03-23T12:10:50.277Z [46259B90 verbose 'Monitor' opID=SWI-8ae9d7be] [HostAccessMonitor::ClusterStateListener] Cluster state changed to 3
2012-03-23T12:10:50.279Z [46155B90 verbose 'Placement' opID=SWI-66208758] [PlacementManagerImpl::CancelVmPlacement] removed 0 of 8 vms
2012-03-23T12:10:51.273Z [FFF72B90 verbose 'Election' opID=SWI-7c83c9f5] [ClusterElection::MasterStateFunc] Am isolated! Dropping to STARTUP!
2012-03-23T12:10:51.273Z [FFF72B90 warning 'Election' opID=SWI-7c83c9f5] Election error
2012-03-23T12:10:51.273Z [FFF72B90 info 'Election' opID=SWI-7c83c9f5] [ClusterElection::ChangeState] Master => Startup : Election error
in this log i read the host is isolated..
that would be right...
the VM "SvSAN ESX1.vmx" should be POWEROFF --> but this doesnt work... why? thats my first question?!
the second question is, what meens the last line "Master => Startup : Election error" ???
another part of the logs:
2012-03-23T12:13:00.323Z [461D7B90 verbose 'Cluster' opID=SWI-75018e0a] [ClusterDatastore::DoCheckIfLocked] Checking if datastore /vmfs/volumes/4f0d8e48-cc6fcf66-a20a-68b599722822 is locked
2012-03-23T12:13:00.327Z [461D7B90 verbose 'Cluster' opID=SWI-75018e0a] [ClusterDatastore::DoCheckIfLocked] Checking if datastore /vmfs/volumes/4f0d8e48-cc6fcf66-a20a-68b599722822 lock state is 3
2012-03-23T12:13:00.327Z [461D7B90 verbose 'Policy' opID=SWI-75018e0a] [LocalIsolationPolicy::ProcessDatastoreLockState] check of /vmfs/volumes/4f0d8e48-cc6fcf66-a20a-68b599722822 returned 3 (scheduled=true)
2012-03-23T12:13:20.332Z [FFEF0B90 verbose 'Policy' opID=SWI-796ce82b] [LocalIsolationPolicy::ProcessDatastore] Issuing lock check for datastore /vmfs/volumes/4f0d8e48-cc6fcf66-a20a-68b599722822
2012-03-23T12:13:20.332Z [46155B90 verbose 'Cluster' opID=SWI-cfcab5bb] [ClusterDatastore::DoCheckIfLocked] Checking if datastore /vmfs/volumes/4f0d8e48-cc6fcf66-a20a-68b599722822 is locked
2012-03-23T12:13:20.337Z [46155B90 verbose 'Cluster' opID=SWI-cfcab5bb] [ClusterDatastore::DoCheckIfLocked] Checking if datastore /vmfs/volumes/4f0d8e48-cc6fcf66-a20a-68b599722822 lock state is 3
2012-03-23T12:13:20.337Z [46155B90 verbose 'Policy' opID=SWI-cfcab5bb] [LocalIsolationPolicy::ProcessDatastoreLockState] check of /vmfs/volumes/4f0d8e48-cc6fcf66-a20a-68b599722822 returned 3 (scheduled=true)
this ID of datastore is my local datastore...
what meens "lock state is 3" or "is locked"??
why is my local datastore locked?
all local VMs are running on it... even when the switch is booting..
is this the error why the fdm cant shutdown my vm "SvSAN ESX1"?
it would be very nice if anybody could help me again...
