VMware Cloud Community
QProf
Contributor
Contributor
Jump to solution

Need help with datastore storage, local array, and network connections

Need help with my ESXi 4.1 set-up

My Hardware:

I built a whitebox server with an Asus P6T, i7 920, 12 Gig RAM, Intel Pro1000 PT Quad Ethernet card, 3ware 9650SE 12ML with 8 1.5 TB SATA Green drives in a raid 6 array giving me about 8+ TB with one hotspare all housed in a NORCO RPC-4220 4U Rackmount Server Chassis. I also have a 500 GB SATA drive which will hold the ESXi and virtual machines.

Network includes a Netgear Prosafe FVS336G Firewall, Netgear GS724Tv ProSafe 24-Port Gigabit Managed Switch on a dhcp cable modem internet service provider.

I also have 2 old NetGear SC101T NAS Drives (4 TB) I want to connect to system some-how at a later date.... have data on them and want to transfer to new storage array. I still looking into whether they will work with ESXi 4.1, or I might have to just access them through Windows XP.

My Situation:

I have already installed ESXi 4.1 and vsphere client with no problems and it is connected to a dhcp cable internet service. I have set up host name through a Dynamic DNS service giving me a static host name on the internet. I have installed three virtual OS machines successfully at the moment, and now want to first start by creating a multimedia storage server which will use some of this new 8 TB array, then separate datastores for use with a small web server, general storage, and some backup. This is a home set-up.

Help with datastore and network:

I was doing some reading, as I am new to this, and it looks like I probably want to setup my array through ESXi as a nfs drive format. Now usually the datastore is in another physical box from what I understand, but I have put my drives and ESXi all in the same box. I am not sure how best to set this up with teamed network cards, but I would like to make this work.

I understand that in ESXi 4.1 using iSCSi the LUN’s must be just under 2 TB, but nfs – I should be able to add a larger partition then 2 TB (for my multimedia) in nfs, right? or do I still have to add them separately as separate 2 TB drives and then extend them to get larger space.

Any suggestions and/or direct resources showing examples on how to actually add portions of the array as separate nfs datastores. I know to go to the configuration tab, then select add storage, then select nfs. I do see my array, but It is here that I am not sure what to do next since ESXi 4.1 system already has a address, do I put the same for the new datastore array also (will this work?), and what do I use for folder and datastore name.... do I just make something up. I was thinking to later install Openfiler (for a multimedia storage server using this array) as a virtual machine, and use the array with esxi so that I can access the same storage space with both widows and linux based systems.

I also know that I am going to have to figure out how best to use my quad nic card... setting up virtual switches, teaming, etc HELP?

Any direction, help, sample similar set-ups, suggestions or resources that would help would be great. I have done a lot of hunting, but still a little confused about how best to set this up.

Q-Prof, "Quality of life is too valuable to take short cuts. By constantly improving our habits and striving to do our best, will dramatically improve the quality of our lives"
Reply
0 Kudos
1 Solution

Accepted Solutions
ehall
Enthusiast
Enthusiast
Jump to solution

You need to think of VMDK files as large databases with random-size records The guest will read some data (a DLL or an INI file), maybe write some data back out, then go read some other data. Some files are tiny but some DLLs are several megabytes. It is all random I/O and heavy on the seek times. Opsys I/O is small random operations that are often sequential (go read data, write data, go read some other data, ...) so turnaround times are crucial to overall performance. This is why people say to benchmark the IOPS and forget the MBs throughput. The only time you do bulk transfers are when you are reading media (ISO files).

Okay, now forget all of that. Actually the disk activity will depend on the specific applications (database? mail server? compiler machines?), but the above is true for boots and whenever the applications are idle. You need to profile to know.

RAID-10 is faster (and often more reliable) than RAID-5 or RAID-6 except in some specific cases. Generally speaking RAID-10 is best for lots of random writes, since calculating parity for RAID-5 and -6 adds to the overall latency between command and reply--latency is cumulative so a little slow here and a little slow there adds up to a lot of slow overall especially with synchronous I/O over a network. OTOH RAID-5 and -6 can produce faster reads due to the number of heads, so you may want to use it for VMs that do bulk transfers. Test. You may find that you need multiple sub-arrays of different types to get the best results.

You said 3ware, they have some good notes on their site but don't believe all of it. With my 9650 I ended up with only a couple of their recommendations--I set the (single) array for 256k stripe size, nr_requests to 2x the queue_depth, and used the deadline scheduler. For the filesystem I used Ext4 formatted with stride and stripe-width synchronized to the array, and used the large_files options with fewer inodes (do not use the huge_files option unless you plan to have single VMDK files in the terabyte range). Use a large readahead cache.

The VMs use VMDK files in all cases except raw iSCSI LUNs which they treat as native disks. VMDKs are easier to manage--you can make a backup by copying the file, you can move it to a PC and load it in another flavor of VMware, etc. There might be some iSCSI features of your SAN like transparent migration but nothing for me. NFS has less protocol chatter so lower latency times to complete an operation. NFS is good at read and write a block of data, which is all this stuff comes down to.

UPS is good but it won't help if something inside the machine blows up (UPS does nothing if the PC power supply fails). If the RAID card has an option for a battery backup module then it can hold some writes in memory and can finish out the disk I/O after you replace the power supply. 3ware also limits the kinds of caching available if the BBU is not installed, and you only get the good numbers with the module.

View solution in original post

Reply
0 Kudos
8 Replies
idle-jam
Immortal
Immortal
Jump to solution

i'm not sure where to start with, but i guess it will post something for the question to keep rolling.

maximum per VMDK is only 2TB, of course your vmfs lun has to be formatted to support such disk space. one of the suggestion would doing many 2TB vmdk assigned to the virtual machine and have linux lvm to stripped it into a single volume. or different drives for different purposes/usages.

if it's just a small lab setup, a virtual switch with 4 NIC teamed up as active/passive would be fine. of course you could read more on the networking here http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100408...






iDLE-jAM | VCP 2, VCP 3 & VCP 4

If you found this or any other answer useful please consider the use of the Helpful or correct buttons to award points

J1mbo
Virtuoso
Virtuoso
Jump to solution

I would ditch the raid-6 + hotspare idea and run raid-10 - 3x the random write performance with only one more disk of overhead, and of course much quicker rebuild. NFS datastore can be >2TB, but the max individual vmdk size is 2TB. There is an extensive thread on here about NFS performance at the moment. If you need to manage multi-TB VMDKs on NFS, the only real choice for underlying file system is XFS with nobarrier (otherwise you won't be able to delete them without generating errors) which really means having BBWC and a UPS connected.

Also bear in mind ESXi can be run from USB flash drive.

HTH






http://blog.peacon.co.uk

Please award points to any useful answer.

Unofficial List of USB Passthrough Working Devices

QProf
Contributor
Contributor
Jump to solution

Idle-Jam, thanks for your response.

I understand that you can use extents or extend the datastore size over 2TB in ESXi by adding >2TB LUN's together. I also read on another forum that it was suggested that if the seprate LUN's are on the same spindle i.e. array, that if a disk were to go, that the raid would be supported or able to rebuild if a disk died.

On the other side I have read that some have not had great strength by adding LUN's using the extent or extend ESXi option, and again was suggested to keep the size of your datastore less than the 2TB alloment, so as to not run into more problems.

As far as the nic teaming, that is a great little support video. I actually have 5 nic's, one was an original - separate card, and then I purchased a quad and just added it to the system after reading about the necessity of them.

Q-Prof, "Quality of life is too valuable to take short cuts. By constantly improving our habits and striving to do our best, will dramatically improve the quality of our lives"
Reply
0 Kudos
idle-jam
Immortal
Immortal
Jump to solution

you can extend it your VMFS, but then your VMDK will be max of 2TB. (does not make any sense at your case, but just adding more risks) you will still need to do software raid inside the OS to switch it up to be 8TB. or have different NFS shares at each of the VMDK. with the later option you will not be worried of corrupted software raid then will cause total data failure.


iDLE-jAM | VCP 2, VCP 3 & VCP 4

If you found this or any other answer useful please consider the use of the Helpful or correct buttons to award points

Reply
0 Kudos
QProf
Contributor
Contributor
Jump to solution

J1mbo,

I have seen the recommendation prior to use raid 10 over raid 5 or 6. I had read that the performace for my particular 3ware card was relatively good performance, better as you added more disks, and I liked the fact that raid 6 gave you 2 disk breathing room. Raid 10 does not have as much of a fail safe, but did not realize that it is 3 times faster write performance.... that is fast, a lot faster and yes makes sense the rebuild would also be quicker. I guess, if I can do a hot spare with a Raid 10 that could work also. I will look into this again to make sure. I was not too concerned with speed, as it was just for multi-media for the most part, but a web server would need a fast speed.

XFS.... interesting about this file system, as it sounds great, I was looking at Free nas, but leaning to Openfiler as the host storage server system. I guess once I have the storage server up, I can set I should be able to use the same storage space for Ubuntu and Win7 together right (based on Openfiler on XFS?

As far as battery back-up/UPS, I remember when I was setting the raid up, that was one of the final questions for a cach option, to make sure if I used it to have a UPS, which I do, so no problem there.

Thanks for the input

Q-Prof, "Quality of life is too valuable to take short cuts. By constantly improving our habits and striving to do our best, will dramatically improve the quality of our lives"
Reply
0 Kudos
ehall
Enthusiast
Enthusiast
Jump to solution

You need to think of VMDK files as large databases with random-size records The guest will read some data (a DLL or an INI file), maybe write some data back out, then go read some other data. Some files are tiny but some DLLs are several megabytes. It is all random I/O and heavy on the seek times. Opsys I/O is small random operations that are often sequential (go read data, write data, go read some other data, ...) so turnaround times are crucial to overall performance. This is why people say to benchmark the IOPS and forget the MBs throughput. The only time you do bulk transfers are when you are reading media (ISO files).

Okay, now forget all of that. Actually the disk activity will depend on the specific applications (database? mail server? compiler machines?), but the above is true for boots and whenever the applications are idle. You need to profile to know.

RAID-10 is faster (and often more reliable) than RAID-5 or RAID-6 except in some specific cases. Generally speaking RAID-10 is best for lots of random writes, since calculating parity for RAID-5 and -6 adds to the overall latency between command and reply--latency is cumulative so a little slow here and a little slow there adds up to a lot of slow overall especially with synchronous I/O over a network. OTOH RAID-5 and -6 can produce faster reads due to the number of heads, so you may want to use it for VMs that do bulk transfers. Test. You may find that you need multiple sub-arrays of different types to get the best results.

You said 3ware, they have some good notes on their site but don't believe all of it. With my 9650 I ended up with only a couple of their recommendations--I set the (single) array for 256k stripe size, nr_requests to 2x the queue_depth, and used the deadline scheduler. For the filesystem I used Ext4 formatted with stride and stripe-width synchronized to the array, and used the large_files options with fewer inodes (do not use the huge_files option unless you plan to have single VMDK files in the terabyte range). Use a large readahead cache.

The VMs use VMDK files in all cases except raw iSCSI LUNs which they treat as native disks. VMDKs are easier to manage--you can make a backup by copying the file, you can move it to a PC and load it in another flavor of VMware, etc. There might be some iSCSI features of your SAN like transparent migration but nothing for me. NFS has less protocol chatter so lower latency times to complete an operation. NFS is good at read and write a block of data, which is all this stuff comes down to.

UPS is good but it won't help if something inside the machine blows up (UPS does nothing if the PC power supply fails). If the RAID card has an option for a battery backup module then it can hold some writes in memory and can finish out the disk I/O after you replace the power supply. 3ware also limits the kinds of caching available if the BBU is not installed, and you only get the good numbers with the module.

Reply
0 Kudos
J1mbo
Virtuoso
Virtuoso
Jump to solution

ehall, I'm interested in the comment on huge_file vs large_file - could you elaborate?

To clarify my comment on RAID-6, as said above for sequential workloads RAID-5 and 6, raid card CPU depending, can be faster than RAID-10 because n-1 or n-2 disks can work together (vs n/2 for raid-10, although it should similarly be able to sustain two concurrent sequential read streams). But for random writes for less than the stride size then the array must perform read-update-write, which in the best case, this has a full spindle latency and requires 2 disks (raid-5) or 3 disks (raid-6) to service, hence the 4 or 6 IO respective 'write penalty'. Compare this to raid-10, presuming the write is aligned to a hardware sector then it is simply a write to two spindles.

Of course in practice the controller write cache will pick up the write (if the array is not too busy) and complete the writes with near zero latency. But those writes need to be physically committed at some point! In other words the burst throughput will always reflect the controller cache latency whilst the sustained throughput will reflect the underlying drive performance and IO penalties.

Raid-10 also holds up reasonably well when degraded and during a rebuild - some numbers from a 6-drive array (30GB test file, 70:30 read:write ratio, 8K IOs 4K aligned):

Metric

Healthy

Degraded

Rebuild

IOPS:

1376

1068

920

ms:

23.2

29.8

34.7

However with the potential sequential throughput win of Raid-5/6, if I've understood correctly this is for an NFS server? Then the bottleneck will be the network if you are running on GbE (about 110MB/s) - a single SATA drive can get close to that.

For an NFS server, the only 'supported' (and free) option is Fedora 8 (personally I prefer ubuntu 10.10 though). Running NFS from a command-line install of linux is very easy - this might help.






http://blog.peacon.co.uk

Please award points to any useful answer.

Unofficial List of USB Passthrough Working Devices

Reply
0 Kudos
QProf
Contributor
Contributor
Jump to solution

eHall,

Thanks for you detailed feedback.

I especiallly take the backup UPS point very seriously, and so I am going to look into getting a backup for the raid storage card as it is possible that the power suppply could blow... it wouldn't be the first time, although most of the newer quality  power supplies with multilple rails/lots of amps, are just about bulllet proof.

Q-Prof, "Quality of life is too valuable to take short cuts. By constantly improving our habits and striving to do our best, will dramatically improve the quality of our lives"
Reply
0 Kudos