Solved: Local datastore disappeared from datastore window ...

MrVmware9423 · ‎12-13-2013

Dear Team,

I m facing a very strange issue, suddenly one of ESX local datastore disappeared following are the complete details which we have faced/noticed.

**One of the local datastore disappeared from datastore window.able to see that datastore in add storage wizard, allowing us to format the same.

**if we take putty session from there we are able to see and browse that datastore with no issue.

**VMs which are running on that datastore are also working fine (all filesystem are accessible / VM is reachable on network)

**Not able to take image backup getting error " The Object has already been deleted or has not been completely created".

** Not able to take a clone "Could not complete network copy for file ................."

Getting following logs in vmkernel:

Dec 14 17:11:39 localhost vmkernel: 0:01:55:28.677 cpu1:4097)ScsiDeviceIO: 747: Command 0x28 to device "mpx.vmhba1:C0:T1:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44 0x0.

Dec 14 17:11:39 localhost vmkernel: 0:01:55:28.677 cpu1:4097)ScsiDeviceToken: 293: Sync IO 0x28 to device "mpx.vmhba1:C0:T1:L0" failed: I/O error H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44 0x0.

Dec 14 17:11:39 localhost vmkernel: 0:01:55:28.677 cpu6:4110)Fil3: 5354: Sync READ error ('.fbb.sf') (ioFlags: 😎 : I/O error

Need ur urgent assistance to resolve the same.

regards

Mr VMware

MrVmware9423 · ‎12-27-2013

Dear All,

We have locked a case at VMware, please find their findings on the same.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Following the webex session we just had, I've successfully uncovered the root cause of the reported issue to an underlying problem on the block device (either the logic disk, or a problem on the array) being presented to host the datastore in question.

In a nutshell, whenever we try to do raw reads from the disk (from sector 0 onwards), the same ones always fail when we reach the 30932992 bytes (31MB) mark with an IO error (this is consistent, it is always on this region of the disk that read operations fail, no more, no less). This outcome can be seen even when no partition is in the disk (using if=/dev/sdb instead of /dev/sdb1 with dd), and even after zeroing out all the sectors (dd if=/dev/zero of=/dev/sdb). Strangely, read operations work fine (be it writing zeros of random data) along the whole disk. Bear in mind that the tests I did were with non-VMware tools (I pretty much used only dd for these operations), which definitely rules out a VMware problem (in fact, if you were to try to boot the server with a Linux live CD and run the same tests as I did, you would see the same behaviour).

I know that there are no hardware reports of any wrong behaviour on the array, but the data collected with our tests today completely invalidates that. The next step is for you to take this to the server vendor to check for problems on the array or the disks, as they are definitely there and they are the reason for the problem you originally reported.

Please let me know if you have any further questions on this.

Thank you

-

David Meireles

Technical Support Engineer

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Now we have locked a case at hardware vendor , let see what will be the next action.

regards

Mr VMware

View solution in original post

a_p_ · ‎12-13-2013

Assuming you are using ESX 4.1 (that's what I found on your other posts), run fdisk -lu from the command line (putty) to see whether the VMFS datastore's partition type is still "FB" (VMFS). If this is not the case and the partition type is e.g. "07" (NTFS), but the VMFS datastore is still accessible from the command line, it may be possible to use fdisk to change the partition type back to "FB".

André

MrVmware9423 · ‎12-13-2013

Dear Andre,

Thanks for reply.please find the output

a_p_ · ‎12-13-2013

According the output the partitions are ok. What happened prior to the issue? Did you try to rename a datastore? Maybe two of the datastores have the same label for whatever reason. You may try to rename the visible datastore to see whether this helps.

André

MrVmware9423 · ‎12-14-2013

Dear Andre,

getting following errors in vmkernel

Dec 14 17:12:42 localhost vmkernel: 0:01:56:31.974 cpu1:4097)ScsiDeviceIO: 747: Command 0x28 to device "mpx.vmhba1:C0:T1:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44 0x0.

Dec 14 17:12:42 localhost vmkernel: 0:01:56:31.974 cpu1:4097)ScsiDeviceToken: 293: Sync IO 0x28 to device "mpx.vmhba1:C0:T1:L0" failed: I/O error H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44 0x0.

Dec 14 17:12:42 localhost vmkernel: 0:01:56:31.974 cpu7:4110)Fil3: 5354: Sync READ error ('.fbb.sf') (ioFlags: 😎 : I/O error

Dec 14 17:12:43 localhost vmkernel: 0:01:56:33.301 cpu6:4109)Vol3: 1488: Could not open device '47bed85e-2e2f6820-b713-001a6466' for probing: No such target on adapter

Dec 14 17:12:43 localhost vmkernel: 0:01:56:33.301 cpu6:4109)Vol3: 608: Could not open device '47bed85e-2e2f6820-b713-001a6466' for volume open: No such target on adapter

Dec 14 17:12:43 localhost vmkernel: 0:01:56:33.301 cpu6:4109)FSS: 3702: No FS driver claimed device '47bed85e-2e2f6820-b713-001a6466': Not supported

Dec 14 17:12:43 localhost vmkernel: 0:01:56:33.446 cpu1:4097)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x410005105840) to NMP device "mpx.vmhba1:C0:T1:L0" failed on physical path "vmhba1:C0:T1:L0" H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44 0x0.

regards

Mr Vmware

a_p_ · ‎12-14-2013

With the I/O error messages, I'd suggest you open a support case with VMware. Maybe there's something corrupted in the file system which could become even more serious if you wait too long!?

André

MrVmware9423 · ‎12-14-2013

We have copied data from inacessible datastore and all VMs are running fine on another datastore.

if the issue is related to filesystem then will it be okay if we create new vmfs datastore on inaccessbile storage???

regards

Mr VMware

a_p_ · ‎12-14-2013

I think deleting and re-creating the VMFS partition/datastore should work if it is just a logical issue. However, you should keep an eye on this and/or just run some test VMs on this datastore for some time, to see whether or not the issue re-occurs.

André

MrVmware9423 · ‎12-27-2013

Dear All,

We have locked a case at VMware, please find their findings on the same.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Following the webex session we just had, I've successfully uncovered the root cause of the reported issue to an underlying problem on the block device (either the logic disk, or a problem on the array) being presented to host the datastore in question.

In a nutshell, whenever we try to do raw reads from the disk (from sector 0 onwards), the same ones always fail when we reach the 30932992 bytes (31MB) mark with an IO error (this is consistent, it is always on this region of the disk that read operations fail, no more, no less). This outcome can be seen even when no partition is in the disk (using if=/dev/sdb instead of /dev/sdb1 with dd), and even after zeroing out all the sectors (dd if=/dev/zero of=/dev/sdb). Strangely, read operations work fine (be it writing zeros of random data) along the whole disk. Bear in mind that the tests I did were with non-VMware tools (I pretty much used only dd for these operations), which definitely rules out a VMware problem (in fact, if you were to try to boot the server with a Linux live CD and run the same tests as I did, you would see the same behaviour).

I know that there are no hardware reports of any wrong behaviour on the array, but the data collected with our tests today completely invalidates that. The next step is for you to take this to the server vendor to check for problems on the array or the disks, as they are definitely there and they are the reason for the problem you originally reported.

Please let me know if you have any further questions on this.

Thank you

-

David Meireles

Technical Support Engineer

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Now we have locked a case at hardware vendor , let see what will be the next action.

regards

Mr VMware

MrVmware9423 · ‎12-28-2013

Dear Team,

Finally found RAID configruation is corrupted on RAID5 array where we are facing this problem. Please find the update.

Short Desscription:-

1) Shared DSA logs => RAID Information / HDD details not found as ESX server firmware is not updated

2) RUN diagnostic to check any issue related to HW all test passed successfully no issue found

3) Fetch 8K Controller Support logs

4) In that team found BAD stipes need to re-create raid5 array and requested us to update firmware code.

-------------------------------------------------------------

As per Ibm Rsc Poa,

Update all the codes to latest (after data backup).

As there is Bad stripe found on logical drive-2, We would suggest to Recreate the Array(after complete restorable data backup)

so please Schedule Downtime for above activity, after complete restorable data backup

----------------------------------------------------------------

As per Support logs found Logical drive-2 having Bad Stripes.

Logical drive...................1

Logical drive name..............Operating System

RAID level......................1

Data space......................278.99 GB

Parity space....................279 GB

Date created....................01/08/2008

Interface type..................Serial attached SCSI

State...........................Okay

Additional details..............Quick initialized

Read-cache mode ...............Enabled

Write-cache mode................Enabled (write-back)

Write-cache setting.............Enabled (write-back) when protected by battery

Partitioned.....................Yes

Protected by hot spare..........No

Bad stripes.....................No

Logical drive...................2

Logical drive name..............Data

RAID level......................5

Data space......................557.99 GB

Parity space....................279 GB

Date created....................01/08/2008

Stripe-unit size................256K

Interface type..................Serial attached SCSI

State...........................Okay

Read-cache mode ...............Enabled

Write-cache mode................Enabled (write-back)

Write-cache setting.............Enabled (write-back) when protected by battery

Partitioned.....................No

Protected by hot spare..........No

Bad stripes.....................Yes

December 28, 2013 10:20:23 AM EST WRN 215:A01C-S--L-- localhost One or more logical drives contain a bad stripe: controller 1.

Update all the codes to latest (after data backup).

As there is Bad stripe found on logical drive-2, We would suggest to Recreate the Array(after complete restorable data backup)

Refer in the below link for reference:-

https://www-947.ibm.com/support/entry/myportal/docdisplay?brand=5000008&lndocid=MIGR-5090091

-------------------------------------------------------------------

Kindly find attahched 8K Controller Support logs

Please suggest next POA.

--------------------------------------------------------------------

As per shared logs not reflecting any RAID/HDD details.

Kindly share DSA logs from 9.30 and RAID Support logs.

-----------------------------------------------------------------------

MrVmware9423 · ‎12-28-2013

first confirm the device i/e sdb or sdc

fdisk -l

Command to read raw data from partition and create one file which will conatin that raw data information

## will try to read 35MB raw data from sdb partition and it will create test.read.out file.

[root@localhost tmp]# dd if=/dev/sdb of=test-read.out bs=1M count=35

dd: reading `/dev/sdb': Input/output error

29+1 records in

29+1 records out

30932992 bytes (31 MB) copied, 0.528385 seconds, 58.5 MB/s

## before creating a datastore run this command on blank partition / lun, don't run this command if we have already created a datastore , this command will write zero on sdb partition till 35MB.

[root@localhost tmp]# dd if=/dev/zero of=/dev/sdb bs=1M count=35

35+0 records in

35+0 records out

36700160 bytes (37 MB) copied, 0.334465 seconds, 110 MB/s

[root@localhost tmp]#

All

Local datastore disappeared from datastore window (Need urgent support)