So, i have a software component that installs SQL and all the tools, and then sets the TCP port to a random number. The code runs fine, and everything works as expected, but when the port change code runs in the configure phase of the software component, it always exits with an error of exit code 1. i have looked through all the logs and i can't find anywhere why it's exiting with and error.
I have attached a screen shot. It's also worth noting that this only happens on SQL 2017 installs, but works fine on SQL 2016 install. I'm using the same vCenter template (Windows 2016 Standard). The only difference is the version of SQL. I can run the code locally under the same user account, and it runs perfectly fine.
Any help is greatly appreciated.
HA!! Far from it. I'm still tinkering with it. Maybe i'll learn something new in the process....
OK. I have narrowed my issue down to the following ps code:
if (($sc -ne "SQL_2016") -and ($sc -ne "SQL_2014") -and ($sc -ne "SQL_2017"))
{
Write-Output "Not an SQL Server. Skipping Task."
exit 0
}
#region generate and set random port number
Write-Output "Generating Random SQL Port Number....."
$epoch_date = Get-Date("01/01/1970")
$now = Get-Date
$seed = [math]::Floor((New-TimeSpan -Start $epoch_date -End $now).TotalSeconds)
$random_port = Get-Random -Minimum 10000 -Maximum 50000 -SetSeed $seed
Write-Output "Randomly generated port number: $($random_port)"
Write-Output "Setting Randon SQL Port....."
# script derived from https://blog.dbi-services.com/sql-server-2012-configuring-your-tcp-port-via-powershell/
#Import-Module "SQLPS" -ErrorAction SilentlyContinue
Import-Module "E:\Program Files (x86)\Microsoft SQL Server\140\Tools\PowerShell\Modules\sqlps\sqlps.psd1" -ErrorAction SilentlyContinue -ErrorVariable $stdout_module
#$smo = 'Microsoft.SqlServer.Management.Smo.'
#$wmi = new-object ($smo + 'Wmi.ManagedComputer')
$wmi = new-object ('Microsoft.SqlServer.Management.Smo.Wmi.ManagedComputer')
$uri = "ManagedComputer[@Name='" + $env:ComputerName + "']/ServerInstance[@Name='MSSQLSERVER']/ServerProtocol[@Name='Tcp']"
$tcp = $wmi.GetSmoObject($uri)
if ($tcp.IsEnabled -ne $true)
{
$tcp.IsEnabled = $true
$tcp.Alter()
}
$currentTcpPort = $wmi.GetSmoObject($uri + "/IPAddress[@Name='IPAll']").IPAddressProperties[1].value
Write-Output "Current TCP Listening Port: $($currentTcpPort)"
$wmi.GetSmoObject($uri + "/IPAddress[@Name='IPAll']").IPAddressProperties[1].Value=$random_port.ToString()
Write-Output "Finalizing New TCP Port....."
$tcp.Alter()
$newTcpPort = $wmi.GetSmoObject($uri + "/IPAddress[@Name='IPAll']").IPAddressProperties[1].value
Write-Output "New TCP Listening Port: $($newTcpPort)"
Write-Output "Restarting Services MSSQLFDLauncher, MSSQLSERVER, SQLBrowser, SQLSERVERAGENT, and SQLWriter....."
Get-Service MSSQLFDLauncher, MSSQLSERVER, SQLBrowser, SQLSERVERAGENT, SQLWriter | Restart-Service -Force -Confirm:$false
#endregion
I've run this code manually with the same user account that software components uses, and it doesn't throw any errors. As well, this code completes successfully in software components even though it exits with an exit code of 1. As i mentioned before, this same exact code runs without exiting with an error with SQL 2016 and SQL 2014 installs. I use the same blueprint for all three installs.
Did you not try using the SQL cmdlet mentioned by Luc? That's what I would do rather than writing WMI queries. Makes troubleshooting easier as well.
I have been trying to get it to work, but haven't done so successfully yet. I'm working both paths right now.
Something that might explain the 1 exit code.
WMI methods do use an exit code 1 if there is an Informational message.
PowerShell doesn't know this and just propagates the exit code as the exit code of the script.
Can you check the eventlog of the server to verify that there are no Informational events for the WBEM service at the time you call the method?
Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference
DCOM error 10016 is a known "feature", and something you can ignore.
Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference
So, i think i have narrowed this down to the Import-Module SQLPS line of code. I can comment out everything before and after it, and it returns the exit code of 1. I can comment the line before and the rest of the script, and it does not return and error or exit code 1. I'm totally baffled. I also tried calling the .psd1 file directly with the full path and got the same result.
Did you already try adding the -Verbose switch on the Import-Module cmdlet?
Does the Import-Module work when you do it from a PS prompt on the target server?
Are you sure the Execution Policy is set to RemoteSigned on that server before attempting the Import-Module?
Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference
I have tried the -Verbose, but it doesn't output to the software components console.
It works with no issue from a PS prompt on the target server with no errors or warnings.
I haven't checked the execution policy, but i will. I will also output it during the install to make sure it's set to RemoteSigned.
I'll keep coming back to the fact that this works fine on the same vcenter template and blueprint when installing SQL 2014 or 2016, but for some reason it craps out when installing 2017.
I have opened a case with SDK support in hopes that they can help shed some light on what is going on as well. If i get anything useful from that, i'll post it here.
You might also want to include a Start-Transcript (at the start) and a Stop-Transcript at the end of your script.
That should normally capture what appears on the console.
Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference
I tried that at the beginning when i first started troubleshooting this and all it would output was the code where the transcription started and the code that it ended. there was no code for any of the other command in the file. Also verified that when the scripts run in Software Components that the PS execution policy is RemoteSigned.
NOTE: This is just a work around that fixed the symptoms of the main issue. This still does not solve the problem i was getting with the exit code of 1. There is continued discussion around that and how to troubleshoot it effectively.
So, it would seem the fix for this is easier than expected.
Set-ItemProperty -Path 'HKLM:\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQLServer\SuperSocketNetLib\Tcp\' -Name "Enabled" -Value 1
Set-ItemProperty -Path 'HKLM:\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQLServer\SuperSocketNetLib\Tcp\IPAll\' -Name "TcpPort" -Value "random_port_number"
(for versions other than SQL 2017, replace the MSSQL14.MSSQLSERVER with MSSQL##.MSSQLSERVER, where the ## corresponds with the version of SQL you are installing.)
restart the services, and presto!! Had the SQL guys verify that this works like the expect it too, so if it's wrong, i blame them..... 🙂
I am actually running into this more frequently LucD. In this latest case, for example, I've done a Start-Transcript and it shows no errors and have written out both $? and $LASTEXITCODE after almost every line and all those return true, yet at the end of the transcript I'm getting a $global:false and I cannot figure out how to determine where this is coming from. So two questions:
if($?) {
exit 0
} else {
exit 0
}
I would need more details on how these scripts are started.
Is it calling powershell.exe with the code as an argument? Or the code in a file?
Also under which account do these scripts then run?
The exit 0 will indeed return the code 0 to the caller.
But again, it depends on how the script was called.
It might be that return codes from a binary called in the script are propagated.
Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference
It's a combination of native PowerShell in a script and it calling another script. All of this is "wrapped" and called by another script which gets called by the vRA software agent installed on a template. When the template comes up as a new VM, the agent (a Windows service) downloads the work item locally (which comes down as a script) and uses another script to call that script. So script =calls=> script (has native PS plus a separate .ps1 script). And even if I try that code I pasted above, I cannot get it to reset the exit code back to zero. Here's what that user script looks like
Start-Transcript -Path C:\DevCitrixDDCUtils.txt -Append
Write-Output "Mounting PS Drive"
IF ($File_share_username)
{
$SecurePassword = ConvertTo-SecureString "$File_share_password" -AsPlainText -Force
$FileShareCredential = New-Object System.Management.Automation.PSCredential ("$File_share_username", $SecurePassword)
New-PSDrive -Name Z -PSProvider FileSystem -Root "\\$File_server\$File_share" -Persist -Credential $FileShareCredential
}
Else
{
New-PSDrive -Name Z -PSProvider FileSystem -Root "\\$File_server\$File_share"
}
Write-Output "Exit state: $?"
Write-Output $LASTEXITCODE
$certs_location = 'Z:\wildcard'
$citrix_storefront_psdir = 'C:\Program Files\Citrix\Receiver StoreFront\scripts'
$admindomain = $Script_domain
$fqdn = (Get-WmiObject win32_computersystem).DNSHostName+"."+(Get-WmiObject win32_computersystem).Domain
$certpwd = ConvertTo-SecureString -String 'my_password' -AsPlainText -Force
$cert = Import-PfxCertificate -FilePath "$certs_location\${admindomain}wildcard.pfx" -CertStoreLocation Cert:\LocalMachine\My -Password $certpwd
Write-Output "Exit state from Import-PfxCertificate: $?"
Write-Output $LASTEXITCODE
Write-Output "Setting web binding."
New-WebBinding -Name "Default Web Site" -IP "*" -Port 443 -Protocol https
$cert | New-Item -path "IIS:\SslBindings\0.0.0.0!443"
Write-Output "Exit state from New-WebBinding and New-Item: $?"
Write-Output $LASTEXITCODE
Write-Output "Executing SetHostBaseUrl script."
& $citrix_storefront_psdir\SetHostBaseUrl.ps1 https://$fqdn
Write-Output "Exit state from script: $?"
Write-Output $LASTEXITCODE
Write-Output "Cleaning up PSDrive."
Remove-PSDrive -Name Z -Force
Write-Output "Exit state from remove psdrive: $?"
Write-Output $LASTEXITCODE
Write-Output "Resetting exit code"
#?global:true
$LASTEXITCODE = 0
exit 0
if($?) {
exit 0
} else {
exit 0
}
The variables you don't see which are obvious are being set through the software component and they are expanding, so that's not a problem. The wrapper script which is invoked by vRA software components (which you can't change) is here: https://pastebin.com/zGaHwBdj
And here is the output being returned to vRA after the execution of the user code shown above. As you can see, all the return codes show true/success.
Transcript started, output file is C:\DevCitrixDDCUtils.txt
Mounting PS Drive
Name Used (GB) Free (GB) Provider Root
---- --------- --------- -------- ----
Z 108.57 2766.23 FileSystem \\fileshare.domain.c...
Exit state: True
Exit state from Import-PfxCertificate: True
Setting web binding.
IPAddress : 0.0.0.0
Port : 443
Host :
Store : My
Sites : Microsoft.IIs.PowerShell.Framework.ConfigurationAttribute
Exit state from New-WebBinding and New-Item: True
Executing SetHostBaseUrl script.
Exit state from script: True
Cleaning up PSDrive.
Exit state from remove psdrive: True
Resetting exit code
ABORT. Encountered error in Powershell.
Error while executing script: Process exited with an error: 1 (Exit value: 1)
I *think* that the non-zero exit code is being returned by the PS1 script towards the bottom.
I think I'll need to install vRA in my lab :smileygrin:
Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference
It probably wouldn't be a bad thing, but that's a lot of work just to do some troubleshooting for someone else.
Update: In looking more carefully at the wrapper script and the output, it seems clear that an error is being logged even though the transcript has nothing. I'm trying with $error.clear() at the bottom of the user script to see if I can clear them from within.
Ok, it seems that after calling $error.clear() I was actually able to squash that error code. Also for good measure (even though it shouldn't be needed), I set the ExecutionPolicy to bypass and unblocked the script that was being called. That looks like it took care of it. Still, though, I wish I knew how to troubleshoot these types of failures better. Any time vRA executes a software component, any non-zero exit code signifies failure (regardless of whether anything is sent to stderr or not), and tracking down exactly where those are coming from is what I'd like to really understand.
Is the mechanism how vRA runs scripts via the agent documented somewhere?
There are many ways one can start a PowerShell script, just wondering how it is done in practice.
Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference