Solved: Storage questions..please help!!

wgardiner · ‎05-12-2008

Hey folks,

Slightly strange one here. We've got 3 ESX servers, talking to 2 seperate SANs and 2 seperate sites. Site A has ESX2 & SAN2, Site B has ESX1/3 & SAN1.All LUNS are working fine except for 1.

The LUN in question is vmhba 1:0:2:1 and is visible fine from the service console. Ie:

root@xxx:esxcfg-vmhbadevs | grep -i vmhba1:0:2

vmhba1:0:2 /dev/sdd

&

Disk vmhba1:0:2 /dev/sdd (25600MB) has 4 paths and policy of Most Recently Used

FC 2:1.0 210000e08b8b28fb<->200400a0b818a4e3 vmhba1:0:2 Standby preferred

FC 2:1.0 210000e08b8b28fb<->200500a0b818a4e3 vmhba1:1:2 On active

FC 6:1.0 210000e08b934c6f<->200400a0b818a4e3 vmhba2:0:2 Standby

FC 6:1.0 210000e08b934c6f<->200500a0b818a4e3 vmhba2:1:2 On

But I can't see the datastore in Virtual Centre. I've restarted hostd & vpxa services on the ESX servers. Restarted Virtual Centre Service on my VC box.

What can I do to get virtual centre to see this lun?

kjb007 · ‎05-13-2008

Using resignature to fix "snapshot lun issues" can cause problems, and should not be done lightly, so I would say try the reboot.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

View solution in original post

wgardiner · ‎05-12-2008

Hrmm this is in my hostd log as well for the datastore in question:

RefreshVMFSVolumes: ProcessVmfs threw HostCtlException Unable to get FS Attrs for /vmfs/volumes/45a8f531-2cd8c4fd-3541-000d60981fc6

And if I try and check the volumes on the disk I get the following:

root@xxx/sbin/partedUtil get /dev/sdd

25600 64 32 52428800

1 32 52428799 251 128

And for a working LUN:

root@xxx:/sbin/partedUtil get /dev/sde

Geometry Known: 0

31311 255 63 503013376

1 128 503011214 251 128

wgardiner · ‎05-12-2008

Actually even more strangely it appears I've got duplicate device ID's going on, see the attached 2 images. Number 2 is from working server that can see the datastore, number 1 is from one of the servers that cant see the datastore

kjb007 · ‎05-12-2008

Check your storage presentation. You seem to have the same LUN id presented twice.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

wgardiner · ‎05-12-2008

The 2 stores showing there with the same LUN ID are from different SANs...how can they be clashing?! I'm going to unpresent one of the datastores and see if that will stop them fighting.

wgardiner · ‎05-12-2008

OK i've changed the LUN ID of the conflicting LUN, and performed a rescan..but still no viewing of the problematic LUN

Any more ideas? Anything more that I can post here that will help with troubleshooting?

kjb007 · ‎05-12-2008

After you rescan, check vmkernel and vmkwarning logs. See if on the trouble-hosts, they are seeing snapshot LUN and / or asking for Resignature.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

wgardiner · ‎05-13-2008

Hi ,

I did see a snapshot LUN created for the conflicting LUN once I assigned a new LUN ID to it, but not for the LUN im trying to get working.

I've checked the vmkwarning and vmkernel logs, there's nothing in there indicating an issue when rescaning the fibre HBAs.. I do see some issues on the iSCSI adapter but don't think its related:

ie:

May 13 10:14:31 esx1 vmkernel: 21:17:34:43.259 cpu4:1037)iSCSI: queuecommand 0x3dd025d8 failed to find a session for HBA 0x1ff61d8, (3 0 3 0)

These are suspect errors in the hostd.log after a rescan though:

Error Stream from partedUtil while getting partitions: Geometry Known: 0

RefreshVMFSVolumes: ProcessVmfs threw HostCtlException Unable to get FS Attrs for /vmfs/volumes/45a8f531-2cd8c4fd-3541-000d60981fc6

Thanks for your help so far.... do you think its reboot time? or is there anything else we can try?

kjb007 · ‎05-13-2008

Using resignature to fix "snapshot lun issues" can cause problems, and should not be done lightly, so I would say try the reboot.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

wgardiner · ‎05-13-2008

Yeh I already ran that against a small LUN we have and it wiped the contents..lovely. thankfully it was only some templates..but still quite intrusive!

Will update once ive rebooted.

wgardiner · ‎05-14-2008

FYI a reboot resolved these issues. Although still no idea what the real issue was, could see the LUN from the service console alright... maybe it just need a clean read of all its disk data.... no idea

Thanks for your help on this one mate.

kjb007 · ‎05-14-2008

No problem, glad it all worked out well.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB