VMware Cloud Community
sxnxr
Commander
Commander

Rest notification (is this a bug)

I am sending REST JSON payloads from vrops to a SHIM server. Based on the vrops doc this is the payload

"startDate":1369757346267,

  "criticality":"ALERT_CRITICALITY_LEVEL_WARNING",

  "resourceId":"sample-object-uuid",

  "alertId":"sample-alert-uuid",

  "status":"ACTIVE",

  "subType":"ALERT_SUBTYPE_AVAILABILITY_PROBLEM",

  "cancelDate":1369757346267,

  "resourceKind":"sample-object-type",

  "adapterKind":"sample-adapter-type",

  "type":"ALERT_TYPE_APPLICATION_PROBLEM",

  "resourceName":"sample-object-name",

  "updateDate":1369757346267,

  "info":"sample-info"

We use the SHIM to change the above names to different ones that our upstream service uses

The problem i have is the info field is the description that is configured in the alert definition. I would have thought that this would be the alert message which has specific info in it for the alert

Is this a bug or by design and if by design why does it not sent the alert message

Reply
0 Kudos
5 Replies
jasnyder
Hot Shot
Hot Shot

I don't think it's a bug.  vROPS just doesn't send all the symptom information in the POST to the server specified in the outbound notification instance.

Listening on a custom API server to receive the notifications sent by vROPS, the payload looks like this:

{

"updateDate":1511982052934,

"resourceId":"b2282ddd-0cda-4803-b661-55cf00c1f028",

"adapterKind":"VMWARE",

"Health":2,

"impact":"health",

"criticality":"ALERT_CRITICALITY_LEVEL_WARNING",

"Risk":1,

"resourceName":"hwlvrb01",

"type":"ALERT_TYPE_STORAGE_PROBLEM",

"resourceKind":"VirtualMachine",

"alertName":"Virtual machine disk I/O write latency is high",

"Efficiency":1,

"subType":"ALERT_SUBTYPE_PERFORMANCE_PROBLEM",

"alertId":"ad6ed63e-cb2f-4d90-a284-d2861bde96b5",

"startDate":1511982052934,

"info":"9027",

"status":"ACTIVE"

}

The info field appears to be pretty worthless.

I think you might be looking to get symptom information which tells you what metric caused the symptom.  You would have to kind of work your way back from the alertId and resourceId given in the notification from vROPS.

Based on getting the alertId field above, you can make a GET call to vrops-server/suite-api/api/alerts/{alertId}  (in this case alertId = ad6ed63e-cb2f-4d90-a284-d2861bde96b5).  Response:

{"alertId":"233aa304-7ff5-4803-9a90-4d2e0b554b6f","resourceId":"e8021413-bcfa-443b-8bd2-caec892af2ca","alertLevel":"IMMEDIATE","type":"18","subType":"19","status":"ACTIVE","startTimeUTC":1511982352932,"cancelTimeUTC":0,"updateTimeUTC":1511982652939,"suspendUntilTimeUTC":0,"controlState":"OPEN","alertDefinitionId":"AlertDefinition-VMWARE-VMWriteLatency","alertDefinitionName":"Virtual machine disk I/O write latency is high","links":[{"href":"/suite-api/api/alerts/233aa304-7ff5-4803-9a90-4d2e0b554b6f","rel":"SELF","name":"linkToSelf"},{"href":"/suite-api/api/resources/e8021413-bcfa-443b-8bd2-caec892af2ca","rel":"RELATED","name":"alertOnResource"},{"href":"/suite-api/api/auth/users/","rel":"RELATED","name":"ownerOfAlert"},{"href":"/suite-api/api/alertdefinitions/AlertDefinition-VMWARE-VMWriteLatency","rel":"RELATED","name":"problemDefinitionForAlert"}]}

From there get the alertDefinitionId, which in the above example = AlertDefinition-VMWARE-VMWriteLatency

I would then make a GET to vrops-server/suite-api/api/alertdefinitions/AlertDefinition-VMWARE-VMWriteLatency which returns:

{"id":"AlertDefinition-VMWARE-VMWriteLatency","name":"Virtual machine disk I/O write latency is high","description":"Virtual machine disk I/O write latency is high","adapterKindKey":"VMWARE","resourceKindKey":"VirtualMachine","waitCycles":1,"cancelCycles":1,"type":18,"subType":19,"states":[{"severity":"AUTO","base-symptom-set":{"type":"SYMPTOM_SET","relation":"SELF","symptomSetOperator":"OR","symptomDefinitionIds":["SymptomDefinition-VMWARE-VMWriteLatencyCritical","SymptomDefinition-VMWARE-VMWriteLatencyImmediate","SymptomDefinition-VMWARE-VMWriteLatencyWarning"]},"impact":{"impactType":"BADGE","detail":"health"},"recommendationPriorityMap":{"Recommendation-df-VMWARE-IncreaseIopsForDatastores":2,"Recommendation-df-VMWARE-StorageVMotionVM":4,"Recommendation-df-VMWARE-CheckStorageIOControl":1,"Recommendation-df-VMWARE-DeleteOldSnapshotOfVm":3}}]}

That enumerates the Symptoms that make up the alert (alertDefinition.states[X].symptomDefinitionIds).  You would grab all those symptom IDs and then make a call to query the symptoms on the resource.  That's done by POSTing to vrops-server/suite-api/api/symptoms/query with a body that looks like this (specifying basically just the resourceId and all the symptom IDs you want to return, this will get all symptoms on the resource with the given ID):

{

  "compositeOperator" : "AND",

  "symptomId" : [ ],

  "resource-query" : {

    "name" : null,

    "regex" : null,

    "adapterKind" : null,

    "resourceKind" : null,

    "collectorName" : null,

    "collectorId" : null,

    "maintenanceScheduleId" : null,

    "adapterInstanceId" : null,

    "recentlyAdded" : null,

    "resourceState" : null,

    "resourceStatus" : null,

    "resourceHealth" : null,

    "parentId" : null,

    "credentialId" : null,

    "resourceId" : [ "e8021413-bcfa-443b-8bd2-caec892af2ca" ],

    "propertyName" : null,

    "propertyValue" : null,

    "statKey" : null,

    "statKeyLowerBound" : null,

    "statKeyUpperBound" : null,

    "statKeyInclusive" : true,

    "includeRelated" : null,

    "others" : [ ],

    "otherAttributes" : {

    }

  },

  "includeChildrenResources" : false,

  "activeOnly" : true,

  "alarmType" : [ ],

  "alarmCriticality" : [ "CRITICAL", "IMMEDIATE", "WARNING", "INFORMATION" ],

  "symptomDefinitionId" : [ "SymptomDefinition-VMWARE-VMWriteLatencyWarning", "SymptomDefinition-VMWARE-VMWriteLatencyCritical", "SymptomDefinition-VMWARE-VMWriteLatencyImmediate" ],

  "statKey" : [ ],

  "others" : [ ],

  "otherAttributes" : {

  }

}

That call returns (example is shown with a pageSize=1 so it only returned one symptom for brevity's sake):

{"pageInfo":{"totalCount":208,"page":0,"pageSize":1},"links":[{"href":"/suite-api/api/symptoms/query?page=0&pageSize=1","rel":"SELF","name":"current"},{"href":"/suite-api/api/symptoms/query?page=1&pageSize=1","rel":"NEXT","name":"next"},{"href":"/suite-api/api/symptoms/query?page=0&pageSize=1","rel":"RELATED","name":"first"},{"href":"/suite-api/api/symptoms/query?page=207&pageSize=1","rel":"RELATED","name":"last"}],"symptom":[{"id":"06ea4d59-5f73-4417-bd61-8458e1d74712","resourceId":"e8021413-bcfa-443b-8bd2-caec892af2ca","startTimeUTC":1511983552939,"updateTimeUTC":1511983552939,"cancelTimeUTC":1511984152936,"kpi":false,"symptomCriticality":"WARNING","symptomDefinitionId":"SymptomDefinition-VMWARE-VMWriteLatencyWarning","message":"HT above 27.266666666666666 > 15","links":[{"href":"/suite-api/api/symptoms/06ea4d59-5f73-4417-bd61-8458e1d74712","rel":"SELF","name":"linkToSelf"},{"href":"/suite-api/api/resources/e8021413-bcfa-443b-8bd2-caec892af2ca","rel":"RELATED","name":"symptomOnResource"},{"href":"/suite-api/api/symptomdefinitions/SymptomDefinition-VMWARE-VMWriteLatencyWarning","rel":"RELATED","name":"symptomDefinitionForSymptom"}]}]}

Within that block, the thing you probably care about is the message = "HT above 27.266666666666666 > 15" and maybe the time.  If you filter for active only as in the example above then you will only get currently active symptoms.

It's pretty drawn out (although the code would execute pretty quickly once written), but this is what you'll have to do to get that information if you need it.  As far as I know there isn't a way to force vROPS to send different information with the notification out of the box, and I don't know if you can modify the Rest notification plug-in, but you may be able to write your own if you're so inclined.

Reply
0 Kudos
sxnxr
Commander
Commander

Thanks for the reply but to me that makes no sense. Nativity the REST does not send the symptom information which lets be honest is prob the most impotent bit.

Here is one of my scenarios

We audit the movement of VMs in our CMDB so if there is a problem our NOC looks at the CMDB to see what host the VM was moved from and where to.

This is what Tivoli currently logs 

pastedImage_2.png

I need to get vrops to add the same level of detail by sending a REST payload to the CMDB.

This is the event in vrops (Sent from log insight)

pastedImage_3.png

All the info is there. I just dont get why vrops would not send the symptom message. Kind of makes the REST notifications useless

Reply
0 Kudos
jasnyder
Hot Shot
Hot Shot

I completely agree with your assessment.  They should just send the symptoms as a member of the notification REST POST body which would be an array of all the symptoms.  Even that is a bit unwieldy, but given that an alert can have somewhat complex definitions, and may need to meet all symptom conditions before being triggered, the resultant message or symptom digest could potentially be fairly large.  But even if they just collected the symptoms for you that triggered the alert, that would save a boatload of effort.

Reply
0 Kudos
carvaled
Enthusiast
Enthusiast

unfortunately this is just the way the REST outbound plugin works / designed... I had a similar issue about 9 months ago.... But in my case I did not want to use SHIM server, I didn't want another man in the middle...

In the end I ended up building my own Custom Rest Outbound plugin for vRops where I could structure the message as I required (XML in my case) and populate it with vRops alert, criticality  etc... which was hen used to generate tickets in out incident management system.

I am getting ready write a blog post and share my Outbound plugin.... but need to find the time to write the post...

If you are interested i could try to speed it up.... just be aware that its not supported by VMware and at each vRops upgrade you need to make sure its still compatible as sometimes it is dependent on libs which have been updated in new releases... (I am keeping it up to date as I need it for my client)

The outbound solution works like this... vRops will replace the items between the #'s and POST or PUT the XML or JSON it to the destination server.

#alertstate#, state

#alertadapterkind#, alertBase.getAdapterKind()

#alertdefinitionid#, alertBase.getAlertDefinitionId()

#alertname#, alertDefName

#alertdescription#, alertDefDesc

#alertcriticality#, alertCriticality

#alertefficiency#, AlertUtil.convertHREvalue(alertBase.getEfficiency())

#alerthealth#, AlertUtil.convertHREvalue(alertBase.getHealth())

#alertimpact#, alertBase.getImpact()

#alertvropservername#, AlertUtil.getHostName(alertBase.getAlertInfo())

#alertvropserver#, alertBase.getAlertInfo().getHostAddress())

#alertresourcekind#, resourceKind

#alertresourcename#, resourceName

#alertresourceuuid#, alertBase.getResourceUUID()

#alertrisk#, AlertUtil.convertHREvalue(alertBase.getRisk())

#alertstatus#, Integer.toString(alertBase.getStatus())

#alerttype#, alertType

#alertsubtype#, alertSubType

#alertid#, alertBase.getAlertID().getUUID()

#alerturl#, alertBase.getAlertInfo().getAlertDetailURL()

#startdate#, starttime

#updatedate#, updatedate

#canceldate#, canceldate

So you build your structure and place in the #value# in the nodes or structure you require and vRops will populate it with the actual alert info...

<Incident>

     <IncidentRequest>

          <DateTime>#startdate#</DateTime>

          <SourceCallBackAddress>#alerturl#</SourceCallBackAddress>

     </IncidentRequest>

     <IncidentMessage>

          <ExternalIncidentID>#alertid#</ExternalIncidentID>

          <Impact>#alertimpact#</Impact>

          <IncidentDescription>#alertstate# - #alertresourcekind# - #alertresourcename# - #alertname#</IncidentDescription>

          <Urgency>#alertcriticality#</Urgency>

     </IncidentMessage>

</Incident>

and this is what the example output looks like

<Incident>

     <IncidentRequest>

          <DateTime>2017-11-17T23:41:02Z</DateTime>

          <SourceCallBackAddress>https://192.168.YYY.XXX/ui/index.action#/object/862ef610-ce08-40f6-8cb0-9dd0b501ee1d/alertsAndSympto...</SourceCallBackAddress>

     </IncidentRequest>

     <IncidentMessage>

          <ExternalIncidentID>8c81bd0f-f342-48f1-a410-43c17e547ce7</ExternalIncidentID>

          <Impact>health</Impact>

          <IncidentDescription>add - VirtualMachine - sexigraf - Object is down</IncidentDescription>

          <Urgency>critical</Urgency>

     </IncidentMessage>

</Incident>

Cheers

vMan

Reply
0 Kudos
carvaled
Enthusiast
Enthusiast

Finally got around to blogging about it... http://vman.ch/vrops-custom-rest-outbound-plugin/

Reply
0 Kudos