Solved: Multiple drives/vmdk's or large vmdks for Oracle D...

TomasOU · ‎10-09-2019

A little background before the actual questions:

We are standing up a two node VSAN cluster (version 6.7U2) that we will be hosting an Oracle Database 12cR2 instance for reporting purposes. The VSAN hosts are all NVMe based (cache [1.6TB drives] and capacity tiers [6.4TB drives]). We will not be using Oracle ASM (Automatic Storage Management) or Oracle RAC. We will probably be running XFS on RHEL (Redhat Enterprise Linux) 7.x and getting data into the reporting instance using Oracle Active Data Guard for our production Oracle database running on separate physical hardware using 10gbps ethernet.

I have reviewed the Oracle Database VM and Database Storage Configuration | Oracle Database on VMware vSAN 6.7 | VMware but this assumes using Oracle ASM which we cannot use (we aren't licensed for it).

Now for the questions:

What are the tradeoffs to using either a single (or just a few) large VMDK's (3-6TB) or breaking this up into a many smaller VMDK's (1TB)?

I think it makes sense to keep the operating system, the database files, and oracle install on separate VMDK's for backup and recovery purposes but I'm curious if there are any performance or reliability issues that are known. For example, if I have 3x 1TB VMDK's versus 1x 3TB VMDK will there be any difference in how long a snapshot takes to create or how long a storage vMotion takes?

Are there other bottlenecks that might be encountered such as doing parallel I/O to multiple devices instead of to a single device?

thank you and I appreciate any feedback.

markarends · ‎10-23-2019

If you want to improve the performance of the memory snapshot GS suggested us to set:

mainMem.ioBlockPages = "2048"

mainMem.iowait = "0"

See also https://virtuallyhyper.com/2013/04/snapshots-take-a-long-time-when-keep-memory-is-checked/

Use at your own risk

This improved our performance from 1 hour to 2,5 minutes.

View solution in original post

depping · ‎10-10-2019

I have never seen any performance tests for SvMotion for something like this to be honest. I doubt it will matter much either. I would probably recommend to keep it simple and go for a larger VMDK. vSAN will slice it up during placement as required.

TomasOU · ‎10-14-2019

Hi Duncan,

I appreciate the response. One test I just tried was how long it would take to make a snapshot of the VM's running with similar OS configuration (256G RAM, 4vCpu) but differing in number of VMDK's. One with a single 4TB VMDK and another with 4 VMDK's which total about 4TB (.4TB, 1.1TB, 1.2TB, 1.3TB) and the snapshot times where about the same (Execution Time: 1 h 44m 54 s for the single VMDK system and Execution Time: 1h 43 m 51 s for the system with 4 VMDK). In this case it looks pretty consistent which is good but this is only one test case.

This brought about one concern as to why it would take nearly 2 hours to create a snapshot (each VM was pretty much idle and the only two VM's running on the 2 node cluster). From what I (I'd consider myself a VMware or VSAN noob) could tell it appeared that my bottleneck was in networking (seemed during the snapshot the network utilization [combined transmit-rates and receive-rates] per host was about 43,000 - 53,000KBps when one snapshot running and around 90,000 for two). Would this be a tuneable somewhere?

I also came across the following post by Cormac (https://cormachogan.com/2016/02/19/vsan-6-2-part-4-iops-limit/ ) which gives an example of one of the items I was looking for, i.e. that there are potential IOPS limits per VMDK which may mean that multiple VMDKs might be better to allow for tuning them individually.

Another potential which I am still looking into is that the Linux VM itself might be tuned for specific I/O (minimally by choosing a different scheduler [i.e. noop vs deadline] but I'd like to know if there are others) and as I understand it those tuneables are set per block device which might also lend itself better to using multiple VMDK's. As one example, we typically us LVM (Logical Volume Management) to create separate volumes for the Oracle data files and their resultant RMAN backup files and in this case the volume with the data files would be a mixed read/write and more random I/O while the volume with the RMAN backup files would probably be more writes and sequential. So my thought was that if I had separate VMDK's for these that I might be able to tune them differently in the VM OS (even though the underlying block devices are all handled by VSAN). Am I off the mark here?

I like the idea of keeping it simple but if there is a way to avoid some future pitfalls now before the systems goes to production I'd like to identify them and understand what the trade-offs are that we might be making.

Thanks again for any insight into this.

-Tomas

markarends · ‎10-23-2019

Have you tried to create the snapshot without the memory of the virtual machine? In our environment, this is a lot faster. I expect the time to take the snapshot doesn't depend on the size of the vmdk's, but on the size of the memory.

TomasOU · ‎10-23-2019

If I don't include the memory in the snapshot and select quiesce file system then the snapshots do complete in about 5 seconds. I suppose I need to consider the cases where I would use the snapshots and if including the memory would be needed. I was a bit surprised by the time with memory considering the 2 node cluster is currently dedicated to running two VM's that are currently doing next to nothing.

To add a little to the discussion about the number of VMDK's one of the items passed on to me was that vCenter itself for 6.7 uses 13 VMDK's. Some detail for the 12 disks it utilizes in 6.5 are at VCSA 6.5 -Understanding VCSA 6.5 VMDK Partitions & Mount Points . Since one of the issues that we struggle with currently on our physical database systems is resizing and growing files systems I thought using multiple VMDK's would help with that.

thanks for any and all comments!

markarends · ‎10-23-2019