VMware Cloud Community
MaikoVA
Contributor
Contributor

Comunication error with the SAN

Hi !! I hope somebody can help my with this issue: I´m having problems with the cold cloning process, I have to migrate 3 HP servers and I successfully migrated one of them but since then every time I tried to migrate the other 2 servers after 1 or 3 hr the process crashed and the server reboot.

Yesterday I made a connection with the SAN (that has a "!" orange led) and the gurú tells me that the SAN has an error in module B.

Today I shutdown the 2 ESX servers, the SAN and the Switches in order to see if this could fix the problem ( a local guy from IBM told me to do it) but everything´s the same, I still see the "!" orange led and I´m not able to create a new Virtual machine or migrate a new server.

Thanks a lot.

Message was edited by: RDellimmagine

Move from New Site Feedback community.

0 Kudos
53 Replies
beagle_jbl
Enthusiast
Enthusiast

Also can you try port 903 - it appears port 903 is used for the console, 902 for the Virtual Center agent.

telnet 10.119.0.150 903

Is there a firewall between you and the ESX hosts.

What version of ESX server are you running?

Can you try both telnets to the other ESX host and send screen shots.

0 Kudos
MaikoVA
Contributor
Contributor

Looks like 903 is not working

Saludos / Regards

0 Kudos
MaikoVA
Contributor
Contributor

Sorry !!

Here´s the file

Saludos / Regards

0 Kudos
beagle_jbl
Enthusiast
Enthusiast

A few important pieces of info:

1) Is there a firewall between your PC(the one running VI Client) and the ESX Server Service Console (10.119.0.150)?

2) Is there a firewall on your PC that might restrict outbound traffic?

3) What version of ESX Server are you sunning on the ESX Host in question, is it 3.0, 3.0.1, or 3.0.2?

0 Kudos
MaikoVA
Contributor
Contributor

Hi !!

Answers:

1. There´s no Firewall between the Pc and the ESX

2. We use the Trend office scan firewall but is disable in that particular

Pc

3. The version .... I´m not really sure about this. The "local" pc that we

use for License Server and VMware Virtual Infrastructure Client 2.0 has no

firewall (windows or trend)

Saludos / Regards

0 Kudos
MaikoVA
Contributor
Contributor

It´s 3.0.1

Saludos / Regards

0 Kudos
beagle_jbl
Enthusiast
Enthusiast

Wierd. There should definitely be a response on port 903 just like there was on 902. Port 903 is for the listener for the console - on every ESX server. One workaround that may fix the issue is by executing the following command from the ESX console: /sbin/service network restart

That seems to have resolved the issue for some although it crops up from time to time apparently. Reinstalling ESX would like fix the issue permanently but if you are more keen on resolving that I am willing to assist. Anyhow, try running that command on the "bad" ESX server and then try to open the console on one of the VMs. If that works then we know where the problem is.

0 Kudos
christianZ
Champion
Champion

Well I'm not sure whether such "quick and dirty" configuration should be good for the future production.

I assume you need a clean san configuration first (with multipathing, as I saw) - your storage management should working and you should configure your DS4700 for Esx using correctly (see san configuration guide). I saw vmfs volumes extents by you - is that needed here?

Just my practice thoughts.

0 Kudos
beagle_jbl
Enthusiast
Enthusiast

I think they have the SAN issues worked out. The extent was added because the VM's were very large and they ran out of space on the LUN. The quick and dirty fix was just an attempt to narrow down the problem. It appears the real issue is that the listener for the console on port 903 is not running. The VM's are starting, the user just can't see that they are because they cannot connect to the console. It would be faster just to rebuild ESX server but I can see where some would rather identify the issue and resolve it... in case it happens again.

0 Kudos
MaikoVA
Contributor
Contributor

Hi good morning !!

I tried to make a telnet to the ESX that is working (10.119.0.151) on both

ports (902 and 903) and in both ports the telnet responded.

I tried to make the same in the ESX that is not working (10.119.0.150) and

only the port 902 responded ..... the 903 did not !

So ....... should I tun the command you mentioned before /sbin/service

network restart in order to see if the ESX responds on 903?

Saludos / Regards

0 Kudos
beagle_jbl
Enthusiast
Enthusiast

I would try it and see if it works. Your ESX server should definitely be responding on port 903 - if it doesn't you won't be able to connect to your VM console. I think you are creating VMs just fine, you just can't connect to the console.

0 Kudos
MaikoVA
Contributor
Contributor

I ran the command that you told me and after that i tried again a telnet

to 10.119.0.150 903 and there was the same error message.

I have to tell you that when i restart the network service a saw 2 files

running the ifcfg-vswif0 and ifcfg-vswif0.old

The "old" file was created by a IBM guy once he came here and tried to

change the IP adress of the ESX in order to move it from VLANs etc.

When I ran the commando you gave me i saw that both files are excecuted

and I don´t know if that´s causing the problem ....... should I delete

this .old file?

Thanks 😃

Saludos / Regards

0 Kudos
beagle_jbl
Enthusiast
Enthusiast

That might be a bit hasty.

Maybe just post the contents of those files. Is there no ifcfg-vswif1?

0 Kudos
MaikoVA
Contributor
Contributor

Hi !!

I deleted the ifcfg-vswif0.old file and then ran again the command you

gave me, I tried again with a telnet on port 903 and it worked just fine

!! then I went to my pc and ran the Infrastructure client 2.0 and connect

to the 10.119.0.150 server and tried to create a new VM and save it on the

SAN and i received this error message.

After that I tried to create again a new virtual machine but save it not

on the SAN but on the ESX and it worked just great ! I power on the VM and

i was able to see it booting but in the SAN now I´m not able to create VM

...... Weird !!

Saludos / Regards

0 Kudos
beagle_jbl
Enthusiast
Enthusiast

Well, that's something. At least the console issue is resolved.

Can you resend the screen shot of your network config (showing virtual switches) now that you deleted that vswif0.old? It's on the config tab under networking. If you can, can you post networking, storage, network adapters, and storage adapters screen shots?

I just need to confirm one more thing: You access the SAN through fibre channel and not iSCSI, correct?

0 Kudos
MaikoVA
Contributor
Contributor

Here are the files from both ESX servers and as you know the 10.119.0.150

is the one with problems !

Saludos / Regards

0 Kudos
MaikoVA
Contributor
Contributor

Sorry ..... In order to access the SAN ia have to connect a cable in port

1in controller A or controller B, it´s a regulat UTP cat 6 cable and not

fiber.

Saludos / Regards

0 Kudos
beagle_jbl
Enthusiast
Enthusiast

You may want to give that ESX server a reboot since deleting that vswif0.old. Maybe watch the console for errors as it boots

The UTP cables are likely just for mgmt of the controllers. It doesn't appear that you are setup for IP storage. You'll still need fibre channel connectivity to the SAN - based on your config. Can you verify that

A) That all your cabling is correct (and similar to the server that is functioning).

B) That your zoning on the fibre channel switches is correct.

C) Permissions/presentations are setup correctly on the SAN.

Once you're sure of all that, go to the console and use the command: esxcfg-rescan vmhba2 . This can be done from the GUI as well.

You should see the Valeo VM San Storage on the Storage tab after you rescan. If not, you either have a cabling problem, fibre channel switch zoning problem, or SAN config problem. Let's see what happens with the rescan and go from there.

0 Kudos
MaikoVA
Contributor
Contributor

I checked the cables and they look good (10.119.0.150 and 151) are

connected properly, I don´t see any error led or something in the

switches, I made the "rescan" command from the GUI and the ESX was able to

see the SAN.

I tried to create a new VM in the ESX with problems (.150) and i was able

to create it and it RAN !! I was able to see in the console that the VM

was booting.

Then I made a connection with the SAN and in the begining it looked fine

but several minutes later the GURU appeared again with the same error

message. I´m not sure if I already sent you a screen with the error

message but I have the Guru error here:

I´m gonna try to run a "cold-cloning" process again and see what happend

!!

Saludos / Regards

0 Kudos
beagle_jbl
Enthusiast
Enthusiast

It looks like your preferred path is pointing to the wrong controller (B instead of A). Is it possible you have your HBA FIBRE CHANNEL cables backwards (into the wrong switch). If they are you may have to switch them and redo your zoning.

Regardless, can you rescan, then under Storage select the SAN storage volume, click properties, click Manage Paths button, then send me the screen shot of the Manage Paths option? Then get the same screen shot for the one that works. You may have to rescan too - as sometimes the Manage Paths option doesn't come up without a recent rescan.

0 Kudos