This is the Release Candidate for the vSphere 5.1 Hardening Guide.
All links to documentation and KB articles have been updated. All API and CLI fields have also been updated. Guidance for VCSA and SSO have been added.
Attached is a separate change log document.
We welcome your comments on this draft. It is expected that this draft will be public for two weeks. At that time, all changes will be incorporated and the final release will be published.
A minor observation... the change-default-password guidance appears to have been duplicated in the vCenter Server worksheet (where it appears with Component = vSphere, Subcomponent = VCSA).
I'm trying to wrap my head around the new distinctions in guidance introduced in this revision (for SSO, web client and VCSA and - to a lesser extent - VUM). Is vCenter Server guidance intended to apply only to vCenter Server as deployed on a Windows operating system, and not to the appliance?
Another minor note: I suggest that it may assist automation if the worksheet labels did not contain whitespace characters (spaces). Avoiding whitespace (and e.g. punctuation) characters could help, for example, in the use of canonicalized identifiers following the scheme proffered earlier (and which I'm using below).
Regarding web client certificate checking, Web Client.verify-ssl-certificates seems to be the same as vCenter Server.verify-ssl-certificates. Does this guidance intend to relate to certificate checking performed by the web client service?
I haven't recently checked and don't recall what I was prompted with when using the web client (where I deployed the appliance to a lab environment with automatically generated certificates). I would suggest that mitigation of possible MitM attacks should include verification of SSL certificates presented by the web client service to the browser-based clients used by administrators and other vSphere users; I think, therefore, that the guidance could be improved by more explicitly describing the types of certificate warnings a user may see in a browser-based client. The PowerCLI assessment command could be extended to checking certificates presented by web client services that do not use an assessed vCenter Server's certificates (for example, standalone web client deployments).
While I'm thinking on certificates, it also occurs to me that I don't know how certificate revocation is handled with the appliance. The hardening reality becomes somewhat self-apparent when considering the PowerCLI assessment script being run on a Windows client or Windows O/S hosting vCenter server. I don't recall discussion of online CRL updates or OCSP in VCSA networking descriptions.
Unfortunately I don't have access to a correctly licensed VCSA at this time to check... should it be appropriate to check certificate validation between the web client service and the registered / managed vSphere infrastructure servers? And if so, is it reasonable to rely upon the status of any trusted 3rd party certificates included by VMware?
Apart from CRLs, there also appears to be no guidance on certificate store confidentiality and integrity with the appliance. I would suggest that good hardening guidance would deal with the particular threats that arise with virtual machine deployments (precisely because deployment on physical or non-VMware virtual machines is not intended). The guidance for vCenter Server on Windows includes auditing of filesystem structures where certificates and encryption keys are stored. A vSphere user with somewhat limited privileges related to the appliance VM, or the datastore on which its virtual disk resides, could conceivably recover the private part to the asymmetric SSL encryption key with relatively little effort. As there is already significant guidance in relation to vCenter Server on Windows in the guide, it would seem appropriate to provide more guidance in the respect of the appliance.
I think there is certainly more discussion that can (and will) be had around SSL and X.509 certificates in vSphere... so I think I'll stop there for now.
On a more general note relating to the appliance (whichever components are selected for deployment)... I suspect I'm not the only one asking: Is it possible to get some certainty on patching the appliance through concrete hardening guidance?
While I think most people understand that the appliance is underpinned by what may be termed a "general purpose operating system", there's a black-box mentality that is being somewhat promoted and implicitly expected with appliances. Given the language used in vCenter Server.apply-os-patches and VUM.patch-vum-os suggests ongoing activity post-deployment, it would seem consistent to give guidance along one of two distinct lines; 1) apply VCSA updates published by VMware to address both product and O/S vulnerabilities or 2) stay up-to-date with SLES patches through an internal repository or process to address O/S vulnerabilities (or possibly subscribe to updates to "SLES for VMware" through a qualifying product?), and apply VCSA updates to address vulnerabilities in VMware products.
@mikefoley, I'll try to send an email to you seperately regarding my comments on CIM interaction privileges and the guidance given in relation to it.
Scratch the comment about identifiers. I re-read the Intro section and noted that the version independent identifier should use the component identifier and not the worksheet name. Apologies.
Thanks Mike and Charu for continuing to provide this guidance and seeking feedback from the community.
The inclusion of scripted assessment and remediation information is in my view particularly helpful, and I'm only sorry that I haven't had enough opportunity to look at this more closely during the time drafts have been available.
I don't have any further serious feedback at this time, but wanted to note three things before final publication.
1. A certain amount of the vCenter guidance is still marked as version 5.0:
2. Some of the assessment scripts will rely on access other than vSphere credentials. This has been noted e.g. in vCenter.patch-vum-os but not in the following:
3. Some guidance for non-default settings that restricts the way a function operates does not describe functional impacts:
Awesome suggestions. I've incorporated many of them in the final product. RE: step 3 and "Negative Functional Impact". As called out in the Intro page of the worksheet, "Negative Functional Impact indicates if this guideline has any side effects that reduce or prevent normal functionality". I'm not sure, for example, that limiting log sizes or numbers would have a negative functionality. The Vulnerability Discussion goes into pretty good detail on how the changes would have a more positive impact.
If you have more input, I'd love to hear it asap. And your suggestion of discussing these in the community is most welcome. Any updates or changes between releases can be released as a KB article and can be incorporated going forward into later releases.
Thanks again for your very helpful work. MUCH appreciated!
Not a problem Mike, happy to contribute what I can.
The vulnerability discussion covers most of the important points. Importantly, it identifies how applying the guidance mitigates against a potential DoS attack mounted (on or) from a hosted VM.
In the more general case, files that are unconstrained in either size or number and which may log attacker-controlled data can be associated with resource consumption risks. In the VM log file case these two controls protect mostly against free space exhaustion, and log throttling (automatic) helps protect against monopolisation of host I/O time. In each case, log evidence of malicious activity may be deliberately hidden or obscured by an attacker relying on such protections.
However the limits may also hide or obscure details about normal operational concerns; snapshot operations for example. A VM log file rotation also occurs following a vMotion, and the limit in number of log files may reduce the amount of history available in a busy DRS cluster.
For VM.limit-setinfo-size, I understand the explicit setting replicates the implicit one, and I take the point that there souldn't therefore be a functional impact.
For ESXi.config-firewall-access, there is perhaps an inconsistency as negative functional impacts have been described for vCenter.restrict-network-access and VCSA.restrict-network-access.
For the shell timeouts, there may be some functional impact to long-running administrative operations. There may also be some negative functional impacts to external interfacing applications that use SSH - although I can't imagine that such application interfaces would be a recommended practice!
For vNetwork.enable-bpdu-filter the vulnerability discussion includes the case of certain products that may generate legitimate, desirable BPDU packets. Employing the filter with these cases would likely have a significant negative impact and not employing them introduces risks if an errant BPDU packet was sent by another hosted VM.
I updated the ESXi.config-firewall-access control to be consistent. You get that one because I love consistency.
I get your point on the log history. There's always a trade-off when playing with stuff like this. I'd argue that if your environment is that sensitive, it might be worth using an external logging solution to capture everything via SYSLOG/vCenter logging.
RE: Shell timeouts. There's nothing new there.. It's always been a tradeoff since mainframe timesharing days. At what point do we document industry practices .vs. vSphere specifics? It's a tough line to draw, I'll grant you that. In this case, I'm going to punt and see if anyone else concurs with you.
RE: bpdu. I'd have to consult with the guys who own that piece as it's outside of my comfort zone at the moment. I don't think it'll make it into this edition if they agree with you. If so, it would come out as an addendum in a KB article and would get rolled into the next HG.
Again, thanks so much for your very valuable input on this guide. The help has been awesome. Find me at VMworld or VMUG so I can buy you a beer.