VMware Cloud Community
gopinathp
Contributor
Contributor

Can we create all virtual hadoop cluster nodes that use VMDKs that reside on a shared datastore instead of using local disks on the esxi hosts?

We have a vshpere ESXi cluster with 10+ hosts that are all diskless (Dell R610s that boots off CF). These hosts are connected to a single shared datastore (Tintri) over 10Gb NFS.

We have a need to pilot a hadoop project that I believe can easily fit into the resources we have on our ESXi cluster, have all the CPU, memory and storage available.

The only hitch is that the documentation requires(?) the use of local disks for the hadoop nodes.

Direct Attached Storage

Attach and configure direct attached storage on the physical controller to present each disk separately to the operating system. This configuration is commonly described as Just A Bunch Of Disks (JBOD). Create VMFS datastores on direct attached storage using the following disk drive recommendations.

8-12 disk drives per host. The more disk drives per host, the better the performance.

1-1.5 disk drives per processor core.

7,200 RPM disk Serial ATA disk drives.

Given the performance of the Tintri ( low latency and very high IOPS & throughput) I am wondering why we can't build these VMs that have their VMDKs residing on the shared datastore. Is direct attached storage a must for virtualizing hadoop cluster?

Does anyone foresee any issues? I'd hate to purchase physical servers for this when we have enough capacity on the ESXi cluster to handle it.

thanks

1 Reply
fakber
VMware Employee
VMware Employee

Hi Lance,

You can certainly create Hadoop clusters and have their VMDKs reside on shared storage instead of local storage.

The recommendation to put your VMDK files on local storage is to ensure that you have good overall performance.  However, if you have architected your shared storage environment well enough, then theoretically you should see good performance there as well.

When you configure your resources in BDE, create your shared resources and add all of your shared storage volumes (VMFS/NFS) there.  Once you do that, when you go to deploy your cluster, ensure that each node group is configured to reside on shared storage.   You can do this by choosing the "custom" option for size and sizing the nodes as you need.

I hope this helps.

Faisal