We have a vshpere ESXi cluster with 10+ hosts that are all diskless (Dell R610s that boots off CF). These hosts are connected to a single shared datastore (Tintri) over 10Gb NFS.
We have a need to pilot a hadoop project that I believe can easily fit into the resources we have on our ESXi cluster, have all the CPU, memory and storage available.
The only hitch is that the documentation requires(?) the use of local disks for the hadoop nodes.
Given the performance of the Tintri ( low latency and very high IOPS & throughput) I am wondering why we can't build these VMs that have their VMDKs residing on the shared datastore. Is direct attached storage a must for virtualizing hadoop cluster?
Does anyone foresee any issues? I'd hate to purchase physical servers for this when we have enough capacity on the ESXi cluster to handle it.
thanks
Hi Lance,
You can certainly create Hadoop clusters and have their VMDKs reside on shared storage instead of local storage.
The recommendation to put your VMDK files on local storage is to ensure that you have good overall performance. However, if you have architected your shared storage environment well enough, then theoretically you should see good performance there as well.
When you configure your resources in BDE, create your shared resources and add all of your shared storage volumes (VMFS/NFS) there. Once you do that, when you go to deploy your cluster, ensure that each node group is configured to reside on shared storage. You can do this by choosing the "custom" option for size and sizing the nodes as you need.
I hope this helps.
Faisal