fontyyy
Contributor
Contributor

software iSCSI, built in network cards etc. how much to expect

Jump to solution

Hi guys,

My setup is (thus far).

4*Dell 1850 twin xeon's around the 3 ghz mark, 6 - 8 gb ram each

1*EMC CX3-10c SAN, iSCSI, currently running a 4 spindle RAID 10 lun for VM's

2*cheapo Dell ethernet gig switches

That's about it, no toe's, no HBA's, nothing, just the built in NIC's.

I'm having real difficulty getting more than 2 hosts to see a lun at the same time.

2 is fine, vmotion works and relatively understressed servers tick over nicely, 3 or more hosts connected to a lun at a time hangs everything, the Virtual Centre, a host itself sometimes loses its link not only to the SAN but it's own h/d and even the SAN front end seems to have a bit of a go slow.

The message on the host screens says something like "the device that would have been attached has not been attached as the path is in use or passive" (sorry, that's not the exact wording, I'm at home and don't fancy recreating the error), also when it does sort of work the VM's are very very slow and the event viewer (on the vm) is full of scsi device not ready errors.

The two hosts that are currently connected and working are using SP A0 and SP A1, any attempt to link another host in via SP B hangs the host, any attempt to link via SP A produces the error above.

So it's clear whats happening, the link just can't deal with the traffic, is this to be expected when running software iscsi?

There is an option with a bit of hardware jiggling where by I could run ESX on two servers linked to the SAN, and one or two others with a hefty amount of local (SCSI) drive space.

I've been running ESX with local storage for months now and have been very impressed with it so I'm leaning towards that and maybe run the more disc useage intensive servers as local VM's, OK we lose the H/A stuff but hey, we never had it anyway.

Or maybe I'm just doing something wrong?

0 Kudos
1 Solution

Accepted Solutions
christianZ
Champion
Champion

The config seems to be ok.

The EMC CX serie works as active-passive, so you should present each lun only over one (the same) SP- when I remember right.

Are you doing it this way?

View solution in original post

0 Kudos
6 Replies
Jae_Ellers
Virtuoso
Virtuoso

4 spindles are not much. What kind of disks are these (FC, SATA).

You didn't say how many vms you are trying to run over iSCSI. You didn't say how much i/o you're trying to do.

Either way, just connecting to iSCSI shouldn't cause the system to fall over.

Many people are using the SW initiators with decent results.

Accurate error messages will help. It also help folks find the same issues down the road. Look in the /var/log/messages, vmkernel, vmkwarning files

-=-=-=-=-=-=-=-=-=-=-=-=-=-=- http://blog.mr-vm.com http://www.vmprofessional.com -=-=-=-=-=-=-=-=-=-=-=-=-=-=-
christianZ
Champion
Champion

Can you post here:

esxcfg-nics -l

esxcfg-vswitch -l

esxcfg-vmknic -l

esxcfg-vswif -l

>2*cheapo Dell ethernet gig switches

This could be an issue here (maybe)

0 Kudos
fontyyy
Contributor
Contributor

christianZ , here you go, thanks for any help;

Using username "root".

root@10.2.0.2's password:

Last login: Tue Aug 14 11:09:10 2007 from radium.lou.ac.uk

\[root@ESX-02 root]# esxcfg-nics -l

Name PCI Driver Link Speed Duplex Description

vmnic0 06:07.00 e1000 Up 1000Mbps Full Intel Corporation 8254NXX Gigabit Ethernet Controller

vmnic1 07:08.00 e1000 Up 1000Mbps Full Intel Corporation 8254NXX Gigabit Ethernet Controller

\[root@ESX-02 root]# esxcfg-vswitch -l

Switch Name Num Ports Used Ports Configured Ports Uplinks

vSwitch0 32 8 32 vmnic0

PortGroup Name Internal ID VLAN ID Used Ports Uplinks

Vlan110 portgroup8 110 0 vmnic0

Vlan2 portgroup6 2 0 vmnic0

Vlan3 portgroup2 3 4 vmnic0

Service Console portgroup0 664 1 vmnic0

VMotion portgroup3 665 1 vmnic0

Switch Name Num Ports Used Ports Configured Ports Uplinks

vSwitch1 64 4 64 vmnic1

PortGroup Name Internal ID VLAN ID Used Ports Uplinks

Service Console 2 portgroup15 0 1 vmnic1

iSCSI portgroup14 0 1 vmnic1

\[root@ESX-02 root]# esxcfg-vmknic -l

Port Group IP Address Netmask Broadcast MAC Address MTU Enabled

iSCSI 10.0.1.2 255.255.0.0 10.0.255.255 00:50:56:64:46:5f 1514 true

VMotion 10.1.0.2 255.255.0.0 10.1.255.255 00:50:56:65:91:47 1514 true

\[root@ESX-02 root]# esxcfg-vswif -l

Name Port Group IP Address Netmask Broadcast Enabled DHCP

vswif0 Service Console 10.2.0.2 255.255.0.0 10.2.255.255 true false

vswif1 Service Console 2 10.0.2.2 255.255.0.0 10.0.255.255 true false

Basically our network uses 192.168.x.x, I've used 10.2.x.x (255.255.0.0) for the esx service console, 10.1.x.x (255.255.0.0) for vmotion and 10.0.x.x (255.255.0.0) for iSCSI, the iSCSI is not routed onto the normal network at all.

Jan Ellers;

9 spindles RAID 5 was no different.

The SAN has 15 300gb FC drives, (post about it here[/url], Dell info page here[/url]) it doesn't seem to matter what load you put through it, the second 3 boxes hit the SAN one of the paths fails.

We're only talking about a dozen VMs totally right now, nothing working hard, it's all been fine on ESX on local storage and I've run all bar two of them (which are held locally on one of the ones that can't see the SAN) on one host this morning with no issues, vMotion takes maybe 15 seconds to move a machine, if it was possible to fit all the ram in one server I'm sure we could run everything on one box.

0 Kudos
christianZ
Champion
Champion

The config seems to be ok.

The EMC CX serie works as active-passive, so you should present each lun only over one (the same) SP- when I remember right.

Are you doing it this way?

View solution in original post

0 Kudos
fontyyy
Contributor
Contributor

Right then, bit of a report;

I think I've cracked this, it was (if I'm right) a combination of me not reading and trying it to use both SPs when the SAN is active/passive, me not reading and changing the connection from "most recently used" to "fixed" (specifically recommended against on EMC SANs in the VMware SAN config guide) and me panicing when the ESX box hung rescanning for new storage and pulling the hung boxes link to the SAN before I restarted it.

As I said, two of them were working fine, and linking on sp a0 and sp a1, and the ESX lun is owned by sp a, so register the unlinked box to the lun etc. rescan and in the san front end the box logs in (on sp a1), but it doesn't register in the VC and eventually the box drops into a disconnected state, the local storage VM's stay up.

I shut down the local VM's, restarted the box (needed a hard power down, 15 mins after it's been told to restart it was still sat there responding to pings) and it came back up able to see the lun, then vmotioned a machine there and it seems fine. I benchmarked 3 machines on the SAN sat on the 3 ESX boxes and they are all running OK, clearly they can all see what they need to.

Any thoughts appreciated (thoughts like "you idiot, you shouldn't be near a network" will be taken in good humour) and thanks to those who tried to help.

0 Kudos
fontyyy
Contributor
Contributor

Was writing the above reply while you posted thank you for looking.

The config seems to be ok.

The EMC CX serie works as active-passive, so you

should present each lun only over one (the same) SP-

when I remember right.

Are you doing it this way?

No I wasn't, yes I now am, it works fine.....

Live and learn, thank you again.

Message was edited by: an incompetent fool doing a poor impression of a systems administrator

0 Kudos