SERIOUS BUG with VSAN ? :: Potential DATALOSS - Wi...

ChrisNN · ‎10-16-2014

Hello all,

I will try to describe the case with as less words as possible, but please read through all the details,

before contacting me to undertake this assignment.

We are willing to pay on a per hour or a fixed fee, for an expert to assist in rescuing our data.

(contact me via Private Message or post reply below with preferable method of contact)

Recently, we did setup VSAN, with 6 Dell servers, each server contributing 1 SSD and 1 SATA drives.

All seemed to go well, so after a month or so of reliable operation, we decided to deploy in VSAN some critical data / production virtual machines.

Recently, due to some networking outage (which later got fixed), VSAN lost the drives and does not seem to be able to resync 2 of them !!!

The huge problem, is we don't have a backup of this critical data.

(Yes, we know this is bad news and bad setup to have no backup, but, we used to have a Synology which

stored our corporate data, but as we wanted to attach this to vCenter, we decided to move the data for few

days in VSAN (a Windows VM), until we reformatted Synology and attach it to vCenter as datastore.

Murphy's law here seem to have a perfect match, something could go wrong and it did !)

The problem seems to be something with VSAN inability to initialize the Disk Group, which fails with message: Out of memory

(Not sure if this is a bug of VSAN or something solvable without official support)

Please find below, all related logs and print-screens, that might help you more to analyze if this is something you can do or not.

HARDWARE INFORMATION:

PowerEdge 1950

BIOS Version 2.7.0

Service Tag ****

Host Name esxi01.****

Operating System Name VMware ESXi 5.5.0 build-1474528

4 NIC (2 on board and 2 on PCIe)

RAC Information

Name DRAC 5

Product Information Dell Remote Access Controller 5

Hardware Version A00

Firmware Version 1.65 (12.08.16)

Firmware Updated Thu Feb 27 09:47:26 2014

RAC Time Wed Oct 15 17:18:12 2014

DELL Perc 6/I Integrated Controller

HDD LIST:

naa.6001e4f01f123d00ff000054053ef0b3

   Device: naa.6001e4f01f123d00ff000054053ef0b3

   Display Name: naa.6001e4f01f123d00ff000054053ef0b3

   Is SSD: false

   VSAN UUID: 527eb553-84f7-dcfd-8d86-d7fac441ae69

   VSAN Disk Group UUID: 5284a2b8-286f-06a1-446c-0859b15c8c48

   VSAN Disk Group Name: naa.6001e4f01f123d001aa3557a0c1ba75d

   Used by this host: true

   In CMMDS: true

   Checksum: 3053228329873442935

   Checksum OK: true

naa.6001e4f01f123d001aa3557a0c1ba75d

   Device: naa.6001e4f01f123d001aa3557a0c1ba75d

   Display Name: naa.6001e4f01f123d001aa3557a0c1ba75d

   Is SSD: true

   VSAN UUID: 5284a2b8-286f-06a1-446c-0859b15c8c48

   VSAN Disk Group UUID: 5284a2b8-286f-06a1-446c-0859b15c8c48

   VSAN Disk Group Name: naa.6001e4f01f123d001aa3557a0c1ba75d

   Used by this host: true

   In CMMDS: true

   Checksum: 10629673102587722323

   Checksum OK: true

naa.6001e4f01f123d001bbd4d1006c2faa8

   Device: naa.6001e4f01f123d001bbd4d1006c2faa8

   Display Name: naa.6001e4f01f123d001bbd4d1006c2faa8

   Is SSD: false

   VSAN UUID: 529f41b2-6c21-d365-a3ea-7b7fd2af4a0c

   VSAN Disk Group UUID: 5284a2b8-286f-06a1-446c-0859b15c8c48

   VSAN Disk Group Name: naa.6001e4f01f123d001aa3557a0c1ba75d

   Used by this host: true

   In CMMDS: true

   Checksum: 1032555466466759381

   Checksum OK: true

...

IP Addresses:

Esxi01: 192.*.*.50 --- Good Condition

Esxi02: 192.*.*.51 --- Good Condition

Esxi03: 192.*.*.52 --- Good Condition

Esxi04: 192.*.*.53 --- Good Condition

Esxi05: 192.*.*.54 --- Bad Condition

Esxi06: 192.*.*.55 --- Bad Condition

vsan.disks_stats cluster

192.168.240.54

# esxcli vsan storage list

naa.6001e4f01f124b001aa364c60cfb6035

   Device: naa.6001e4f01f124b001aa364c60cfb6035

   Display Name: naa.6001e4f01f124b001aa364c60cfb6035

   Is SSD: true

   VSAN UUID: 52448891-b3bb-3f6a-1d63-59a069d745ce

   VSAN Disk Group UUID: 52448891-b3bb-3f6a-1d63-59a069d745ce

   VSAN Disk Group Name: naa.6001e4f01f124b001aa364c60cfb6035

   Used by this host: true

   In CMMDS: true

   Checksum: 14125957398847925782

   Checksum OK: true

naa.6001e4f01f124b001aa364e00e7a60bf

   Device: naa.6001e4f01f124b001aa364e00e7a60bf

   Display Name: naa.6001e4f01f124b001aa364e00e7a60bf

   Is SSD: false

   VSAN UUID: 527334fb-2f62-41ef-f067-f99caee21be8

   VSAN Disk Group UUID: 52448891-b3bb-3f6a-1d63-59a069d745ce

   VSAN Disk Group Name: naa.6001e4f01f124b001aa364c60cfb6035

   Used by this host: true

   In CMMDS: false

   Checksum: 17995978848703998982

   Checksum OK: true

naa.6001e4f01f124b001bbd549905ee5d3f

   Device: naa.6001e4f01f124b001bbd549905ee5d3f

   Display Name: naa.6001e4f01f124b001bbd549905ee5d3f

   Is SSD: false

   VSAN UUID: 52852189-90e2-d9dd-e7a6-3b63a8510db6

   VSAN Disk Group UUID: 52448891-b3bb-3f6a-1d63-59a069d745ce

   VSAN Disk Group Name: naa.6001e4f01f124b001aa364c60cfb6035

   Used by this host: true

   In CMMDS: false

   Checksum: 3794219025321980740

   Checksum OK: true

LOGS 192.168.240.54

2014-10-15T07:10:45.078Z cpu3:33550)WARNING: Created slab RcSsdParentsSlab_0 (prealloc 0), 50000 entities of size 224, total 10 MB, numheaps 1

2014-10-15T07:10:45.079Z cpu3:33550)WARNING: Created slab RcSsdIoSlab_1 (prealloc 0), 50000 entities of size 65552, total 3125 MB, numheaps 2

2014-10-15T07:10:45.079Z cpu3:33550)WARNING: Created slab RcSsdMdBElemSlab_2 (prealloc 0), 4096 entities of size 52, total 0 MB, numheaps 1

2014-10-15T07:10:45.079Z cpu3:33550)WARNING: Created slab RCInvBmapSlab_3 (prealloc 0), 200000 entities of size 64, total 12 MB, numheaps 1

2014-10-15T07:10:45.079Z cpu1:33181)WARNING: LSOM: LSOMAddDiskGroupDispatch:4923: Created disk for 527334fb-2f62-41ef-f067-f99caee21be8

2014-10-15T07:10:45.079Z cpu1:33181)WARNING: LSOM: LSOMAddDiskGroupDispatch:4923: Created disk for 52852189-90e2-d9dd-e7a6-3b63a8510db6

2014-10-15T07:10:45.079Z cpu1:33181)LSOMCommon: LSOM_DiskGroupCreate:958: Creating disk group heap UUID: 52448891-b3bb-3f6a-1d63-59a069d745ce mdCnt 6 -- ssdQueueLen 20000 -- mdQueueLen 100 --ssdCap 76099203072 -- mdCap 0

2014-10-15T07:10:45.096Z cpu2:33444)WARNING: Created heap LSOMDiskGroup_001 (prealloc 1), maxsize 128 MB

2014-10-15T07:10:45.100Z cpu2:33444)WARNING: Created slab PLOG_TaskSlab_DG_001 (prealloc 1), 20000 entities of size 944, total 18 MB, numheaps 1

2014-10-15T07:10:45.105Z cpu2:33444)WARNING: Created slab LSOM_TaskSlab_DG_001 (prealloc 1), 20000 entities of size 824, total 15 MB, numheaps 1

2014-10-15T07:10:45.107Z cpu2:33444)WARNING: Created slab PLOG_RDTBuffer_DG_001 (prealloc 1), 20000 entities of size 184, total 3 MB, numheaps 1

2014-10-15T07:10:45.107Z cpu2:33444)WARNING: Created slab PLOG_RDTSGArrayRef_DG_001 (prealloc 1), 20000 entities of size 48, total 0 MB, numheaps 1

2014-10-15T07:10:45.125Z cpu2:33444)WARNING: Created slab LSOM_LsnEntrySlab_DG_001 (prealloc 1), 160000 entities of size 200, total 30 MB, numheaps 1

2014-10-15T07:10:45.126Z cpu2:33444)WARNING: Created slab SSDLOG_AllocMapSlab_DG_001 (prealloc 1), 8192 entities of size 34, total 0 MB, numheaps 1

2014-10-15T07:10:45.131Z cpu2:33444)WARNING: Created slab SSDLOG_LogBlkDescSlab_DG_001 (prealloc 1), 8192 entities of size 4570, total 35 MB, numheaps 1

2014-10-15T07:10:45.132Z cpu2:33444)WARNING: Created slab SSDLOG_CBContextSlab_DG_001 (prealloc 1), 8192 entities of size 90, total 0 MB, numheaps 1

2014-10-15T07:10:45.135Z cpu2:33444)WARNING: Created slab BL_NodeSlab_DG_001 (prealloc 1), 28400 entities of size 312, total 8 MB, numheaps 1

2014-10-15T07:10:45.175Z cpu2:33444)WARNING: Created slab BL_CBSlab_DG_001 (prealloc 1), 28400 entities of size 10248, total 277 MB, numheaps 1

2014-10-15T07:10:45.206Z cpu2:33444)WARNING: Created slab BL_NodeKeysSlab_DG_001 (prealloc 1), 5990 entities of size 40971, total 234 MB, numheaps 1

2014-10-15T07:10:45.231Z cpu1:33181)WARNING: LSOMCommon: SSDLOGEnumLogCB:828: Estimated time for recovering 1437639 log blks is 351536 ms device: naa.6001e4f01f124b001aa364c60cfb6035:2

2014-10-15T07:15:00.834Z cpu1:32786)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x1a (0x412e80899f80, 0) to dev "mpx.vmhba34:C0:T0:L0" on path "vmhba34:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0. Act:NONE

2014-10-15T07:15:00.834Z cpu1:32786)ScsiDeviceIO: 2337: Cmd(0x412e80899f80) 0x1a, CmdSN 0x1bb from world 0 to dev "mpx.vmhba34:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0.

2014-10-15T07:15:00.856Z cpu1:32780)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x1a (0x412e80899f80, 0) to dev "t10.DP______BACKPLANE000000" on path "vmhba1:C0:T32:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2014-10-15T07:15:00.856Z cpu1:32780)ScsiDeviceIO: 2337: Cmd(0x412e80899f80) 0x1a, CmdSN 0x1bc from world 0 to dev "t10.DP______BACKPLANE000000" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2014-10-15T07:16:00.929Z cpu1:33546)WARNING: Heap: 3622: Heap LSOM (1073738600/1073746792): Maximum allowed growth (8192) too small for size (53248)

2014-10-15T07:16:00.929Z cpu1:33546)WARNING: Heap: 4089: Heap_Align(LSOM, 49176/49176 bytes, 8 align) failed. caller: 0x418011c41756

2014-10-15T07:16:00.929Z cpu1:33546)WARNING: LSOM: LSOM_InitComponent:198: Cannot init commit flusher: Out of memory

2014-10-15T07:16:00.929Z cpu1:33546)LSOM: LSOMSSDEnumCb:210: Finished reading SSD Log: Out of memory

2014-10-15T07:16:01.601Z cpu0:32779)LSOM: LSOMRecoveryDispatch:2326: LLOG recovery complete 52448891-b3bb-3f6a-1d63-59a069d745ce:Recovered 1585904 entries, Processed 0 entries, Took 316394 ms

2014-10-15T07:16:01.625Z cpu0:32779)WARNING: LSOM: LSOMAddDiskGroupDispatch:5090: Failed to add disk group. SSD 52448891-b3bb-3f6a-1d63-59a069d745ce: Out of memory

2014-10-15T07:16:01.625Z cpu3:33174)WARNING: PLOG: PLOGNotifyDisks:2854: Notify disk group failed for SSD UUID 52448891-b3bb-3f6a-1d63-59a069d745ce :Out of memory was recovery complete ? No

2014-10-15T07:16:01.625Z cpu3:33174)PLOG: PLOG_Recover:518: Recovery on SSD naa.6001e4f01f124b001aa364c60cfb6035:2 had failed with Out of memory

2014-10-15T07:16:01.625Z cpu3:33174)WARNING: PLOG: PLOGRecoverDevice:4251: Recovery failed for disk group with SSD naa.6001e4f01f124b001aa364c60cfb6035

2014-10-15T07:16:01.625Z cpu3:33174)WARNING: PLOG: PLOGInitAndAnnounceMD:4167: Recovery failed for the disk group.. deferring publishing of magnetic disk naa.6001e4f01f124b001aa364e00e7a60bf

2014-10-15T07:16:01.625Z cpu3:33174)WARNING: PLOG: PLOGInitAndAnnounceMD:4167: Recovery failed for the disk group.. deferring publishing of magnetic disk naa.6001e4f01f124b001bbd549905ee5d3f

2014-10-15T07:16:13.965Z cpu2:34036)PLOG: PLOGAnnounceSSD:4071: Successfully added VSAN SSD (naa.6001e4f01f124b001aa364c60cfb6035:2) with UUID 52448891-b3bb-3f6a-1d63-59a069d745ce

2014-10-15T07:16:13.965Z cpu2:34036)PLOG: PLOGNotifyDisks:2805: MD 0 with UUID 527334fb-2f62-41ef-f067-f99caee21be8 with state 0 backing SSD 52448891-b3bb-3f6a-1d63-59a069d745ce notified

2014-10-15T07:16:13.965Z cpu2:34036)PLOG: PLOGNotifyDisks:2805: MD 1 with UUID 52852189-90e2-d9dd-e7a6-3b63a8510db6 with state 0 backing SSD 52448891-b3bb-3f6a-1d63-59a069d745ce notified

2014-10-15T07:16:13.965Z cpu2:34036)WARNING: PLOG: PLOGNotifyDisks:2831: Recovery on SSD 52448891-b3bb-3f6a-1d63-59a069d745ce had failed earlier, SSD not published

2014-10-15T07:16:13.965Z cpu2:34036)PLOG: PLOG_Recover:518: Recovery on SSD naa.6001e4f01f124b001aa364c60cfb6035:2 had failed with Out of memory

2014-10-15T07:16:13.965Z cpu2:34036)WARNING: PLOG: PLOGRecoverDevice:4251: Recovery failed for disk group with SSD naa.6001e4f01f124b001aa364c60cfb6035

2014-10-15T07:16:13.965Z cpu2:34036)WARNING: PLOG: PLOGInitAndAnnounceMD:4167: Recovery failed for the disk group.. deferring publishing of magnetic disk naa.6001e4f01f124b001aa364e00e7a60bf

2014-10-15T07:16:13.965Z cpu2:34036)WARNING: PLOG: PLOGInitAndAnnounceMD:4167: Recovery failed for the disk group.. deferring publishing of magnetic disk naa.6001e4f01f124b001bbd549905ee5d3f

2014-10-15T07:16:14.007Z cpu2:34036)Vol3: 714: Couldn't read volume header from naa.6001e4f01f124b001aa364e00e7a60bf:1: I/O error

2014-10-15T07:16:14.016Z cpu2:34036)Vol3: 714: Couldn't read volume header from naa.6001e4f01f124b001aa364e00e7a60bf:1: I/O error

2014-10-15T07:16:14.029Z cpu2:34036)FSS: 5051: No FS driver claimed device 'naa.6001e4f01f124b001aa364e00e7a60bf:1': Not supported

192.168.240.55

naa.6001e4f02024b1001aa354fc1821ac19

   Device: naa.6001e4f02024b1001aa354fc1821ac19

   Display Name: naa.6001e4f02024b1001aa354fc1821ac19

   Is SSD: false

   VSAN UUID: 52c0faf1-b242-78ab-6fbc-bdcb1d6c0b96

   VSAN Disk Group UUID: 52f669f0-21c7-096f-2858-9142fc2ef315

   VSAN Disk Group Name: naa.6001e4f02024b1001aa354b8140ff5be

   Used by this host: true

   In CMMDS: false

   Checksum: 10210539749589974934

   Checksum OK: true

naa.6001e4f02024b1001aa354b8140ff5be

   Device: naa.6001e4f02024b1001aa354b8140ff5be

   Display Name: naa.6001e4f02024b1001aa354b8140ff5be

   Is SSD: true

   VSAN UUID: 52f669f0-21c7-096f-2858-9142fc2ef315

   VSAN Disk Group UUID: 52f669f0-21c7-096f-2858-9142fc2ef315

   VSAN Disk Group Name: naa.6001e4f02024b1001aa354b8140ff5be

   Used by this host: true

   In CMMDS: true

   Checksum: 3807632316314793651

   Checksum OK: true

From all the logs and screenshots above, as you can see, VSAN does fail to initialize the Disk Group, with error: Out of memory

VMKERNEL

2014-10-15T07:16:09.189Z cpu2:33885)WARNING: PLOG: PLOGNotifyDisks:2831: Recovery on SSD 52448891-b3bb-3f6a-1d63-59a069d745ce had failed earlier, SSD not published

2014-10-15T07:16:09.189Z cpu2:33885)PLOG: PLOG_Recover:518: Recovery on SSD naa.6001e4f01f124b001aa364c60cfb6035:2 had failed with Out of memory

2014-10-15T07:16:09.189Z cpu2:33885)WARNING: PLOG: PLOGRecoverDevice:4251: Recovery failed for disk group with SSD naa.6001e4f01f124b001aa364c60cfb6035

2014-10-15T07:16:13.965Z cpu2:34036)LSOMCommon: SSDLOG_AddDisk:559: Existing ssd found naa.6001e4f01f124b001aa364c60cfb6035:2

2014-10-15T07:16:13.965Z cpu2:34036)PLOG: PLOGAnnounceSSD:4071: Successfully added VSAN SSD (naa.6001e4f01f124b001aa364c60cfb6035:2) with UUID 52448891-b3bb-3f6a-1d63-59a069d745ce

2014-10-15T07:16:13.965Z cpu2:34036)PLOG: PLOGNotifyDisks:2805: MD 0 with UUID 527334fb-2f62-41ef-f067-f99caee21be8 with state 0 backing SSD 52448891-b3bb-3f6a-1d63-59a069d745ce notified

2014-10-15T07:16:13.965Z cpu2:34036)PLOG: PLOGNotifyDisks:2805: MD 1 with UUID 52852189-90e2-d9dd-e7a6-3b63a8510db6 with state 0 backing SSD 52448891-b3bb-3f6a-1d63-59a069d745ce notified

2014-10-15T07:16:13.965Z cpu2:34036)WARNING: PLOG: PLOGNotifyDisks:2831: Recovery on SSD 52448891-b3bb-3f6a-1d63-59a069d745ce had failed earlier, SSD not published

2014-10-15T07:16:13.965Z cpu2:34036)PLOG: PLOG_Recover:518: Recovery on SSD naa.6001e4f01f124b001aa364c60cfb6035:2 had failed with Out of memory

2014-10-15T07:16:13.965Z cpu2:34036)WARNING: PLOG: PLOGRecoverDevice:4251: Recovery failed for disk group with SSD naa.6001e4f01f124b001aa364c60cfb6035

VAN DISK GROUP GROUP FAILED WITH FOLLOWING ERROR FROM VMKERNEL

PLOG_Recover:518: Recovery on SSD naa.6001e4f01f124b001aa364c60cfb6035:2 had failed with Out of memory

a_p_ · ‎10-16-2014

With critical production data on the VSAN, you should really consider to immediately contact VMware Support, and not try to fix things by yourself!

Maybe the issue is related to the ESXi version/build you are using (ESXi 5.5.0 build-1474528). The first version/build with VSAN support is ESXi 5.5 Update 1 (1623387). However, I think VMware Support will be able to find out the root cause of this issue.

André

admin · ‎10-17-2014

As Andre stated, the build you are using is pre vSphere 5.5 U1 (1623387) which is the GA build for VSAN. Older builds use VSAN beta code.

Please see KB 2086656 (http://kb.vmware.com/kb/2086656) for details.

Please submit an SR with VMware GSS for assistance with getting your disks connectivity back then migrating the data off the beta vsanDatastore and recreating it using supported configuration.

We will need more details like:

Make, model and firmware version of SSD and Magnetic Disks (HDDs)

Make, Model and firmware version of the SAS HBA to which the Magnetic Disks are attached.

Collecting vm-support dumps should provide us with the above details and more.

Please update this post with the SR number once submitted so that I can follow up with GSS team.

ChrisNN · ‎10-17-2014

hello,

please find the SR number below:

Support Request Confirmation Number:

14543375810

admin · ‎10-17-2014

Thanks Chris!

You should have received an email from customer service.

I have instructed them to get the details from this post. However, we will need vm-support bundles from your environment.

Can you use VC to collect diagnostics data from the cluster which includes:

VC support dump

vm-support dump from each node in the cluster.

When you collect them, please upload them to the inbound FTP server as outlined in KB 2070100.

This will help reduce the turn-around time for GSS to begin the analysis of your issue and help you move to a supported configuration.

admin · ‎10-17-2014

In addition to the above, would you please respond to the email you should have received from VMware Customer Service and include your VMware support account information as we are unable to locate it.

Also, please contact your account manager to add you to the list of authorized callers so that in the future you may file SRs on behalf of your company.

ChrisNN · ‎10-18-2014

Hello mkhalil,

my mistake, I forgot to mention earlier that, this incident did occur during my evaluation period of VSAN software and

unfortunately, I do not have a support contract at the moment.

Additionally, as you have already seen above from my OP, I was using the beta version for evaluation and have tried

everything I could in order to rescue my data, for over a 3 weeks period, with no lack.

The new version of VSAN (recently released in GA), does not seem to be backward compatible and an update would

mean data loss, so this is not an option either.

At this point, I do not feel really comfortable with my experience with VSAN to trust my data on it, so I am not sure if I will

take the step to buy a license or not, as I see there is zero community support or public know-how available, but I might

revise my opinion if I really see that even under these circumstances, I did not have data loss on the end.

At the moment, the most crucial thing for me and my employer, is to be able to export the 2 important virtual disks of the

windows virtual machine we had in VSAN, as it includes important patients data (our EMR was in there).

Please let me know what to do in order to save my data and as soon as this is over, I would like a sales rep. to reach out,

so I have a proper consultation of how to make this happen in a reliable way.

Thanks on advance for all your help!

admin · ‎10-18-2014

Chis,

i will grant you an exception for support with this issue.

PLease upload the requested diagnosts info and let me know when done

What is your availability for someone to call you or do you prefer email?

ChrisNN · ‎10-18-2014

Thank you for this,

can you please explain me how to properly export all VC dumps for this ?

I see in vCenter the ability to export system logs, but not sure what exactly you need,

I know that core dumps would be generated if the system had failures on crash,

but not sure what is the support dump.

Finally, as this is the first time I do this, I would appreciate if you can also point me to the

right KB on how to submit these exports for analysis.

Thanks again for all your help !

admin · ‎10-18-2014

What I meant by "dumps" was just diagnostics bundles.

When you collect them via VC, you should have a checkbox to collect VC logs as well.

Please refer to the KB that I mentioned earlier.

Here are the steps for your convenience:

Then Select all cluster nodes and check the box at the bottom of the dialog:

Follow the prompts to save the bundle (zip file with other tgz files included).

You may get prompted to "generate log bundle" then when done, click "Download log bundle".

When done, upload the file to our inbound FTP server as I indicated earlier.

admin · ‎10-19-2014

As for uploading the files, the process is detailed in KB 2070100.

admin · ‎10-20-2014

Hi Chris,

Were you able to collect the diagnostics info and upload them to our inbound FTP server?

I did not see any new activities on the SR records from you.

admin · ‎10-20-2014

I tried to call you at the phone number listed in the SR but got a message that it is out of service.

Would you please respond to the SR email and provide your correct phone number?

ChrisNN · ‎10-20-2014

Hello,

I was out of office for Sunday, I have started the process of logs export and will upload to FTP shortly

I would prefer communications through email and SR as I am not efficient with spoken english, thanks !

I will update the SR and this post, as soon as the file has been transferred to the FTP.

Thanks again for all your help so far !

ChrisNN · ‎10-20-2014

Hello mkhalil,

the files requested have been uploaded into the FTP as a single ZIP file, according to KB instructions.

I have also updated the SR about this, looking forward for your news !

Thanks, I really appreciate your help.

admin · ‎10-20-2014

Chris,

A new SR 14544291010 with severity 2 was created and assigned to a support engineer in your time zone. You should be hearing from him shortly.

SimonTodd · ‎10-21-2014

I notice from your specification you are using a Dell PowerEdge 1950 server with a Dell PERC6i controller, I just want to draw your attention to the controller, the Dell PERC6i is not Certified for VSAN, in fact I suspect Dell do not have any controllers that are supported in the PE1950 server that will work with Virtual SAN

VMware Technical Support cannot provide support for Uncertified Hardware as per the following KB Article:

VMware KB: Requirements and considerations for the deployment of VMware Virtual SAN (VSAN)

Regards

Simon

ChrisNN · ‎10-21-2014

Hello Simon,

I have replied the SR, please let me know about this.

About the VSAN and the PERC controller, if I remove this controller and replace with other compatible part,

will this help with getting access to my data or it is unrelated to my case ???

Thanks !

admin · ‎10-21-2014

Chris,

please DO NOT change any hardware until we are done with trying to recover your data.

Once we are done with that, you will need to recreate your VSAN cluster using support hardware. Only then, you can replace your HBAs.

Also, you will need to make sure that SSDs and Magnetic Disks (HDDs) are also on the VSAN HCL.

In addition, the SSD/HDD ration should not be less than 10%. In your current configuration, it is about 5%.

admin · ‎10-21-2014

Chris,

AS Simon stated in his last post here, your data will not be affected by the upgrade process.

The reference to "Upgrade" in the release notes pertains to the VSAN Datastore itself. What we are trying to do now is upgrade the software so that you do not receive the "out of memory" error and in turn, your disks are again accessible by VSAN. Please go ahead and upgrade VC and ESXi.

All

SERIOUS BUG with VSAN ? :: Potential DATALOSS - Willing to PAY $$$$ for freelancer to assist on rescuing :: URGENT !!!