VMware Cloud Community
NuggetGTR
VMware Employee
VMware Employee

intermittent network outage

Hi all,

Currently have an issue I cant put my finger on, I have 2 virtual machines windows server 2003, bothing running some crappy ibm deployment application which craps its self every time the network drops. Now just by pinging the machines you can see the issue here with a 2sec wait time for the ping.

Reply from : bytes=32 time=1ms TTL=125

this is constantly happening, the network is barely .05% utilized, does have a little high cpu ready with up to 2.5 sec during 70%+ cpu utilization

This issue has apparently only been an issue since we upgraded the esx hosts from 3.0.2 to 4(well thats when it was reported to have started

anyone have any ideas for me? anything I can check?

Cheers

________________________________________ Blog: http://virtualiseme.net.au VCDX #201 Author of Mastering vRealize Operations Manager
0 Kudos
9 Replies
Lightbulb
Virtuoso
Virtuoso

Are there any error logged in System logs of the VMs?

Was virtual hardware version of VMs updated to version 7?

Vmware tools installed and up to date?

You could also run tiny bootable OS VM with IP assigned and ping to determination if it possible host issue rather than VM. I often use DSL Linux image for this pupose (http://www.damnsmalllinux.org/)

download the ISO and boot vanilla 32bit Linux VM shell with ISO. You will get GUI and configuration of the network components is pretty straight forward.

Just a thought

NuggetGTR
VMware Employee
VMware Employee

No nothing in the system logs, appart from the application one where the ibm app is erroring because of netowrk outage.

No Ive hopefully just got permission to upgrade the virtual machine hardware and tools. (fingers crossed but does hardware version make many changes on the network side? i know it gives a new nic type but does it make that much difference?)

No Its not a host issue ive been doing the same test on other severs on the same hosts on the same network and they dont have any issue which I thought of after posting here so it really points to an OS specific issue

Cheers for the help Smiley Happy

________________________________________ Blog: http://virtualiseme.net.au VCDX #201 Author of Mastering vRealize Operations Manager
0 Kudos
jmartin819
Contributor
Contributor

We had a similar issue where vm's were getting disconnected periodically and it turned out to be a storage issue. The vmkernel logs showed a problem with esx trying to find a LUN being presented to it and it turned out that two luns were trying to use the same id. I'd start by going through your host logs to see if there are any errors.

0 Kudos
Texiwill
Leadership
Leadership

Hello,

Also verify if you have any snapshots in use. Which is related to storage. I Have seen this be the cause of such ping times. Remember, everything is related...


Best regards,
Edward L. Haletky VMware Communities User Moderator, VMware vExpert 2009

Now Available: 'VMware vSphere(TM) and Virtual Infrastructure Security'[/url]

Also available 'VMWare ESX Server in the Enterprise'[/url]

Blogging: The Virtualization Practice[/url]|Blue Gears[/url]|TechTarget[/url]|Network World[/url]

Podcast: Virtualization Security Round Table Podcast[/url]|Twitter: Texiwll[/url]

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
0 Kudos
petedr
Virtuoso
Virtuoso

I was thinking the same thing as Edward, that is is there any snapshots in place or snapshot commits occurring when the pings are lost.

www.phdvirtual.com, makers of esXpress

www.thevirtualheadline.com www.liquidwarelabs.com
0 Kudos
NuggetGTR
VMware Employee
VMware Employee

No there was nothing snapshot related when this machine was loosing pings. only looked to be happening once the hosts where upgraded to 4.

But once upgrading to the new hardware version 7 on this machine the issue stopped, but still doesnt explain why... have 3000 odd other guests still sitting on the old harware version with no issues. one of those things i guess

________________________________________ Blog: http://virtualiseme.net.au VCDX #201 Author of Mastering vRealize Operations Manager
0 Kudos
AllBlack
Expert
Expert

Looks like an issue we are seeing to. Are making changes to your storage (i.e. removing LUNS)?

I have an SR open

Have a look at

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=101662...

http://virtualgeek.typepad.com/virtual_geek/2009/12/an-important-vsphere-4-storage-bug-and-workaroun...

Please consider marking my answer as "helpful" or "correct"

Please consider marking my answer as "helpful" or "correct"
0 Kudos
es2000
Contributor
Contributor

Thank You!!! for posting the KB article about the ESX 4 issue of "removing LUNS" and inadvertently causing the All-Paths-Down state. We started having the exact same symptoms mentioned in this forum post, and were able to resolve the issue by going into each of our ESX hosts and removing the "deleted LUN" that was still showing up in our ESX hosts datastores.

0 Kudos
rebootuser
Enthusiast
Enthusiast

Hi NuggetGTR. I came across as issue such as this a while back and it was an incorrectly configured host NIC bond. i.e. one of the ESX hosts had x2 NIC ports used for the production network and one of these ports connected to a incorrect VLAN.

If you were to VMotion this offending vm to another host is the issue replicated?

Regards,

Owen






If you found this or any other post helpful please consider the use of the Helpful/Correct buttons to award points

My Blog: http://rebootuser.com If you found this or any other post helpful please consider the use of the Helpful/Correct buttons to award points.
0 Kudos