VMware Horizon Community
CraigTompkins1
Contributor
Contributor

Notification on Provisioning error

We know it happens from time to time; You look at the View Manager and see a provisioning error on the pool and since you had it set to stop on provisioning errors no new machines are being created. Hopefully it's not often, but it does happen.

My question is: Why is there no setting to alert on those?  Am I missing something?  Is there a fling I can use?  I've got to find a way to be alerted rather than waiting for the helpdesk to tell me users can't log in because we are out of desktops.

Reply
0 Kudos
11 Replies
BenFB
Virtuoso
Virtuoso

We rarely run into provisioning errors outside of a known bug that applied to 7.4.0. Have you looked into the error to see if it can be avoided?

Reply
0 Kudos
CraigTompkins1
Contributor
Contributor

I've opened a ticket with support.  They gathered a bunch of logs and the only thing they could tell me was that the provisioning process was not able to contact the domain controller or create the account.  They were not able to tell me which of those 2 it was.  They were also not able to tell me what DC it tried to connect to.  As you can tell it was not very helpful.

We do have 1 Read Only DC in our DMZ.  There are 3 DC's in the same AD site as our Horizon View environment so I would not expect Horizon to try to create the machine against the Read Only DC, but I guess it's possible.  So we created an ACL on the switch blocking the view environment from connecting to the RO DC just in case that was the issue.

Other than that we don't have any idea of what caused the issue in the first place.  Of course we'd like to prevent it from happening, but in case it does I'd still like to receive an alert somehow.

Reply
0 Kudos
Anobix67
Enthusiast
Enthusiast

I realize this is a handful of months old and am currently looking for the same thing. Some way to be notified if provisioning fails/gets disabled on a pool. Had a similar issue to what you mentioned a month or so ago and didn't come out looking great after. Trying to think if SCOM can watch any specific logs or something to regularly check.

Reply
0 Kudos
CraigTompkins1
Contributor
Contributor

Since we write events to the SQL database I had our SQL Admin write a job that scans the database for key words.  He does a scan every 10 minutes and the code looks back 11 minutes and sends an email if needed.

There is a possibility that it could notify on 1 scan and then the same alert on the next scan if it falls into that 1 minute overlap, but I thought that was better than doing exactly 10 minutes and finding out it only really covered 9:55 or something and we missed one.

I've emailed him and asked for a copy of the job, I'll post it here when I get it from him.

I did complain about this exact things to a couple of VMware's Horizon guys at the Experts Bar here at VMworld this week and they agreed with me that this should be added.  Not sure if anything will come of it, but I did at least complain about it.

Reply
0 Kudos
cbaptiste
Hot Shot
Hot Shot

You can get notified if you have one of the many monitoring tools VMware offers like vROPS or LogInsight. You simply have to monitor the pool and create triggers base on whatever criteria you need. As you can see, Horizon View does not have any type of notifications. They are offloaded to these other tools. 

CraigTompkins1
Contributor
Contributor

Here is our code

-----------------

DECLARE @query NVARCHAR(MAX), @Count INT

CREATE TABLE ##T ([EventID] int, Module  NVARCHAR(max), EventType  NVARCHAR(max), ModuleAndEventText  NVARCHAR(max), [Time]  datetime, [Source]  NVARCHAR(max),Severity NVARCHAR(max), [Node]  NVARCHAR(max), Acknowledged  int, DesktopId NVARCHAR(max), StrValue VARCHAR(max))

SET @query =

N'SELECT a.[EventID]

      ,a.[Module]

      ,a.[EventType]

      ,a.[ModuleAndEventText]

      ,a.[Time]

      ,a.[Source]

      ,a.[Severity]

      ,a.[Node]

      ,a.[Acknowledged]

      ,a.[DesktopId]

      ,b.StrValue

  FROM dbo].[VDI_event] a

left outer join dbo].[VDI_event_data] b on (a.EventID - 1 ) = b.EventID and b.Name =''SVIFaultText''

  where a.ModuleAndEventText like ''%provisioning disabled for pool%'' and a.Time > DATEADD(minute,-11, getdate())'

INSERT INTO ##T

EXECUTE SP_EXECUTESQL @query

SET @Count = @@ROWCOUNT

IF @Count > 0

EXEC msdb.dbo.sp_send_dbmail

@profile_name = '<Profile>’,

@recipients = ‘<Email>’,

@subject = ‘<What you want>’,

@query = 'SELECT * FROM ##T',

@query_result_header = 0;

DROP TABLE ##T

Reply
0 Kudos
Anobix67
Enthusiast
Enthusiast

I hadn't played much with vRops in the past, but I did find how to make the alert in there. There's a few that are built in, such as for provisioning disabled, out of VMs in a pool, or unable to join a pool because of a provisioning error.

Thanks for the heads up about it.

Reply
0 Kudos
sjesse
Leadership
Leadership

Id did something similar,except, I send an email for every error, I decided awhile ago not disable pools on errors which saved some support time. This runs every 15 minutes and catches all kinds of errors.

Declare @view_error_count INT
DECLARE @Body varchar(max)
declare @TableHead varchar(max)
declare @TableTail varchar(max)
declare @mailitem_id as int
declare @statusMsg as varchar(max)
declare @Error as varchar(max)
declare @Note as varchar(max)

Set NoCount On;
set @mailitem_id = null
set @statusMsg = null
set @Error = null
set @Note = null
Set @TableTail = '</table></body></html>';

--HTML layout--
Set @TableHead = '<html><head>' +
'<H1 style="color: #000000">Horizon View Customization Alert</H1>' +
'<style>' +
'td {border: solid black 1px;padding-left:5px;padding-right:5px;padding-top:1px;padding-bottom:1px;font-size:9pt;color:Black;} ' +
'</style>' +
'</head>' +
'<body><table cellpadding=0 cellspacing=0 border=0>' +
'<tr bgcolor=#F6AC5D>'+
'<td align=center><b>Time</b></td>' +
'<td align=center><b>Error Message</b></td></tr>';

select @view_error_count= COUNT(*)   FROM  view_events7.dbo.event as ev where ev.EventType like  '%ERROR%' AND ev.Time > DATEADD(minute,-15,GETdate())

select @Body=(Select
ev.Time AS [TD] ,
ev.ModuleAndEventText AS [TD]
FROM  view_events7.dbo.event as ev where ev.EventType like  '%ERROR%'
AND ev.Time > DATEADD(minute,-15,GETdate())
For XML raw('tr'),Elements)

-- Replace the entity codes and row numbers
Set @Body = Replace(@Body, '_x0020_', space(1))
Set @Body = Replace(@Body, '_x003D_', '=')
Set @Body = Replace(@Body, '<tr><TRRow>1</TRRow>', '<tr bgcolor=#C6CFFF>')
Set @Body = Replace(@Body, '<TRRow>0</TRRow>', '')

Set @Body = @TableHead + @Body + @TableTail

-- return output--
Select @Body

 

if @view_error_count > 0
  BEGIN
    EXEC msdb.dbo.SP_SEND_DBMAIL
@profile_name ='eucadmins',
@mailitem_id = @mailitem_id out,
@recipients='eucadmins@,
@subject = 'Horizon View Error Report',
@body = @Body,
@body_format = 'HTML';
END

Reply
0 Kudos
CraigTompkins1
Contributor
Contributor

For those that don't stop provisioning on error: Have you ever had a problem where say Horizon could not join a machine to AD so it would have stopped provisioning but doesn't, then you end up getting hundreds or thousands of bad desktops that just eat up CPU/Memory/Storage but can't actually be used and then you have to clean them up?  That's my concern with not stopping provisioning.

Reply
0 Kudos
sjesse
Leadership
Leadership

I've never seen it happen, but that doesn't mean its impossible, but I think its improbable. I would think AD would have to break for that to happen, or something would have to happen to the user to connect to AD, outside of that how would Horizon break in a way for that problem to  happen. That's what I like about the alert I have, I see every error that occurred in the last 15 minutes, and any time I've had any problems with bad vms I just run the viewdbchk script to clean them up. We also run mainly instant clones now, so I have this script running as well to deleting any failed instant clones vm, the bulk of the code came from https://blogs.vmware.com/euc/2017/01/vmware-horizon-7-powercli-6-5.html. Between the two of these I rarely have any broken vdi clones that I need to worry about. A warning to anyone that would use this, there is a new feature in 7.9 called longer lived instant clones, I'm not sure if this script would delete them or not.

####################################################################
# Get List of Desktops that have Horizon Agent in problem states.
# Reboot the OS of each these.
####################################################################

#region variables
###################################################################
#                    Variables                                   #
###################################################################
$cs = '' #Horizon Connection Server
$csUser= '' #User account to connect to Connection Server
$csPassword = '' #Password for user to connect to Connection Server
$csDomain = '' #Domain for user to connect to Connection Server

$vc = '' #vCenter Server
$vcUser = '' #User account to connect to vCenter Server
$vcPassword = '' #Password for user to connect to vCenter Server

$baseStates = @('PROVISIONING_ERROR',
                'ERROR',
                'AGENT_UNREACHABLE',
                'ALREADY_USED',
                'AGENT_ERR_STARTUP_IN_PROGRESS',
                'AGENT_ERR_DISABLED',
                'AGENT_ERR_INVALID_IP',
                'AGENT_ERR_NEED_REBOOT',
                'AGENT_ERR_PROTOCOL_FAILURE',
                'AGENT_ERR_DOMAIN_FAILURE',
                'AGENT_CONFIG_ERROR',
                'UNKNOWN')
#endregion variables

#region initialize
###################################################################
#                    Initialize                                  #
###################################################################

# --- Connect to Horizon Connection Server API Service ---
$hvServer1 = Connect-HVServer -Server $cs -User $csUser -Password $csPassword -Domain $csDomain

# --- Get Services for interacting with the View API Service ---
$Services1= $hvServer1.ExtensionData

# --- Connect to the vCenter Server ---
Connect-VIServer -Server $vc -User $vcUser -Password $vcPassword

#endregion initialize

#region main
###################################################################
#                    Main                                        #
###################################################################
$badMachines=@()
Write-Output ""
if ($Services1) {
     foreach ($baseState in $baseStates) {
           Write-Host "Checking $baseState"
           # --- Get a list of VMs in this state ---
           $ProblemVMs = Get-HVMachineSummary -State $baseState

           foreach ($ProblemVM in $ProblemVMs) {

                $VM = Get-VM -Name $ProblemVM.Base.Name
               
               
                $machineDeleteSpec=new-object VMware.Hv.MachineDeleteSpec
                $poolid=$Services1.Machine.Machine_GetSummaryView($ProblemVM.Id).Base.Desktop
                $pool=$Services1.Desktop.Desktop_GetSummaryView($poolid)
                if($pool.DesktopSummaryData.Source -eq 'INSTANT_CLONE_ENGINE')
                {
                    Write-Host "REcoving $VM.Name"
                    $badMachines +=$VM.Name
                    $Services1.Machine.Machine_Recover($ProblemVM.Id)

                }
                # Add -WhatIf to see what would happen without actually carrying out the action.
           }
     }
     Write-Output "", "Disconnect from Connection Server."
     Disconnect-HVServer -Server $cs -Confirm $false
} else {
     Write-Output "", "Failed to login in to Connection Server."
     pause
     }
# --- Disconnect from the vCenter Server ---
Write-Output "", "Disconnect from vCenter Server."
Disconnect-VIServer -Server $vc -Confirm:$false
$body=""
foreach($machine in $badMachines)
{
    $body+="$machine has been deleted because it was in a bad state `n"
}
$badMachines.Count
if($badMachines.Count -gt 0)
{
    Write-Host "Sending Email"
    Send-MailMessage -From '' -To 'eucadmins@'  -SmtpServer '' -Port '25' -Body $body -Subject "Deleted Problem Vms"
}
#endregion main

Reply
0 Kudos
cbaptiste
Hot Shot
Hot Shot

You may want to look into your permissions with either the service account you are using to provisioned these computer objects or the OU in AD. Clearly something is not working on that end causing things not to work properly. I have never had a case where the computer failed to either be created or joined to active directory properly. Last time I had an issues with provisioning it was because the ctkEnabled parameter was not set.

Reply
0 Kudos