Yeah yeah yeah...I know this subject seems to have been beat to death, but apparently, at lease at my organization...there is still a healthy amount of ignorance/disbelief/incredulousness (is that a real word? ). Here's my current back story:
Although I'm the only certified person that eats/breathes/lives this VMware stuff in my organization...I'm yet only a lowly Sys admin and I'm beholden to the spread thin resources of a far too multifaceted architecture group that makes most design decisions. These folks plopped in my lap one day a design for a two host cluster that was to host a very high profile legal retention and forensics application for our company. The design called for the use of 2 Dell PE710's w/48GB of RAM and dual Intel E5530's w/Hyperthreading enabled. This application is very dependant on the ability to move large amounts of often small files from one VM to another (sometimes...sometimes on the other host as well. After a proof of concept was ran using very underpowered hardware (desktops and buffalo NAS storage), we proceeded to build and deploy this system. Three of the VMs that were to be hosted in this design were pure file servers. They each had several large VMDKs (around 1.7 to 1.8 TB) with single volumes on single RAID 5 LUNs hosted on a CX3-80. None of these LUNs had VMDKs on them belonging to any other machine. Well...when application performance testing began, it was immediately noted that file copy times were extremely slow...much slower than even the POC was. So....we had previously deployed another cluster for a specific application (Documentum) that utilized MSCS and used RDM for supposed performance reasoning. So...the gist from some folks was that we should try doing the RDM thing here as well to see if it improved performance any. Of course, I instantly piped up and stated that it was VMware's position that RDM should not be utilized solely on the basis of performance and that any improvement according to their testing and documentation would only amount to around 6-8% improvement at best and that such improvement wouldn't offset the increased management burden and other caveats involved with using RDM in a virtualized environment. (Yes...I've read all of the white papers and all of big boy blog posts on the topic...) In any case...I was overruled and we modified one of the file servers to utilize RDM instead of a VMDK hosted on a VMFS store. Well...lo and behold the performance increased (or rather file copy time decreased) somwhere around 40-45%.
My thoughts are...well...this must mean that we've got something wrong in our current VMFS design if there's that kind of performance delta there. But what I'm really looking for right now is some real-world information. I don't need information to go lookup whitepapers on testing by someone else...I'm looking for first-hand experience. Should I be seeing such a difference in file copy performance merely moving to RDM from pure VMFS? If not...where should I look first in our current VMFS design to improved performance? (We already make sure partitions are properly aligned, we only utilize block sizes that accomodate the size of VMDK we expect to use, and we also usually utilize Diskeeper to pre-expand the MFT to its recommended size before placing VMs into production...). Also...if you've noted such a performance increase...could you elaborate and explain what led you to that point? Thanks in advance!