VMware Cloud Community
EricRS96
Contributor
Contributor

Storing ESXi coredump and scratch partitions in vSAN

Hi, we had some problems lately with ESXi servers that become frozen, and it seems to be related to the logs that being redirected to the vSAN datastore (KB2147541).

There's no more space on the SD Card, so the solution was to move the logs and scratch to another drive.

So I installed some USB key, I've been able to set everything there.

The Syslog.global.logDir is set to "[USB-Datastore] log", and the ScratchConfig.ConfiguredScratchLocation to "/vmfs/volumes/5cc0c7e0-f10c3b07-2462-000af794fe74/.locker", which is the same usb key.

Now come the KB2074026,

Details

vSAN provides the following storage solutions for ESXi coredump and scratch partitions:

    Assign one disk for each host. When you install vSAN on a local disk, a disk is automatically assigned to the host. Use this approach for hosts with more than 512GB of memory.

   

Use a USB or SD card and do not set scratch partitions as non-persistent. vSAN tracefiles take up space in a coredump, so a 4GB SD or USB card is sufficient to support coredumps for hosts with up to 512GB of memory rather than 1TB for hosts without vSAN.

Note: vSAN does not support having scratch log on the vSAN Datastore. For more information, see Redirecting system logs to a vSAN object causes an ESXi host lock up (2147541).

Solution

Using a USB or SD card for ESXi installations, coredump partitions, and a non-persistent scratch partitions has the following drawbacks:

    vSAN tracefiles are stored in a virtual RAM drive that is persisted only in case of host failure. All other log files are stored in a non-persistent virtual RAM drive.

    Use a 4GB SD card or USB drive for coredumps on hosts with 512GB of memory and where vSAN is enabled.

    vSAN cannot recover tracefiles or any other log files, in case of a power loss.

In the first part, it say do not set scratch as non-persistent (in other word set the scratch to persistent) on a USB key with host with up to 512GB.

My servers have 320GB of memory. And from what I understand, my setup use persistent scratch now, right?

And the second part talk about a non-persistent scratch on usb key, but with hosts with 512GB of memory.

So is my setup ok?

After the changes, I began receiving warning from my Veeam One server about low latency on these drives, and want to be sure that its "normal".

Its really bad when you start the day with 2 servers frozen on a 3 servers vSAN cluster! :smileyconfused:

thank you

Eric.

0 Kudos
1 Reply
TheBobkin
Champion
Champion

Hello Eric,

"In the first part, it say do not set scratch as non-persistent (in other word set the scratch to persistent)"

I think that double negative is a mistake and I will get this kb reviewed/updated (it says vSAN 5.5 :smileygrin:) internally.

SD cards are not durable and thus are not suited for constant read/write purposes such as logging - they are fine for boot or dumps because these are not constant and won't burn out the device.

This article helps explain it better (don't mind the date of publish - this aspect is still the same due to the same laws of physics and non-durable devices):

https://cormachogan.com/2015/02/24/vsan-considerations-when-booting-from-usbsd/

"My servers have 320GB of memory."

It only references physical memory of the host due to how much space coredump may require if >512GB and to ensure complete dump is possible:

VMware Knowledge Base

"After the changes, I began receiving warning from my Veeam One server about low latency on these drives, and want to be sure that its "normal"."

It's "normal" in the sense that it is expected for an SD card to have low performance capabilities, don't use them for this, use a syslog server or if you can swing for a more future-forward approach something durable and also internal such as a BOSS setup.

Bob

0 Kudos