I recently upgraded two esx boxes that have been running 5.5 forever to version 6. Upgrades ran fine no issues. However since the upgrade the boxes go to disconnected from vsphere and are unreachable on the network. I can go to the servers and login on the console when this happens. Restarting agents doesn't seem to fix and the box locks up eventually. Only way to get back is a hard reboot.
it didnt take long at all
a couple of hours after i had a node go down. Maybe its time to go to back to vmware 5.5 or hyper-v :smileyshocked:
Hi Mad Scotsman,
Silly question, but have you upgraded your vCenter to 6 as well?
Do you have any logs from around the disconnection time?
Have you tested with a clean build?
Thanks.
- Keiran.
my dell C6100's are locking up. Same issue. the system appears to be OK when hitting the console. I attempt to reboot it and nothing works. I have to hard reset it with the DRAC.
note: I believe some were upgrades 5.5 --> 6.0 Some were fresh installations.
I had to fresh install vCenter Server 6.0. The upgrade did not work for me.
Yes upgraded everything to 6. I found another thread on this and removed both nodes from vcenter and did a uninstall of the vpxa agent and re-added them. Waiting to see if that does anything.
No haven't done a clean install have thought about that will maybe see how it goes this week. Looking at edmandsj dell issue it sounds like he has done both.
Which logs would you like to look at and i will try to gather at next lock up?
not sure. I gathered vobd.log kernel.log and vpxa.log
My logs indicate total loss of network connectivity. Lots of intermittent network connectivity. right around the event.
Just to take a second and compare networking setups. I am using:
Intel 82576 and 82571+EB NICs (2 onboard and 2 in expansion) (both on HCL for 6.0)
2 x Software iSCSI HBAs (1 from onboard and 1 from expansion) iSCSI port binding of course. 1500MTU on vmk's and switch --- all active adapters, notify switches on, failback on, link status only on, route based on originating virtual port ID, reject promiscuous mode
2 x vm network traffic + management network (1 from onboard and 1 from expansion) --- all active adapters, notify switches on, failback on, link status only on, route based on originating virtual port ID, reject promiscuous mode
Connected to Netgear Prosafe stackable switches
no vmotion enabled
edmandsj,
So my servers aren't dell but the symptoms are the same total loss of network I have 2 onboard Intel 82574L NIC's they are shared for iscsi and vm network etc..
Mine probably aren't they are older Tynan servers that have been running for 4 years so have done a few upgrades memory disks over time but never a problem till 6.
I pinged a consultant friend and he has heard that build 2809209 has a major bug in it with the netdev process causing issues with nic's and hba's. A co worker of his has stayed on the release build 2494585 without any issues.
I have wiped my servers and rolled back to that build. I will let you know if this works it usually takes 1-2 days before i have to reboot.
it didnt take long at all
a couple of hours after i had a node go down. Maybe its time to go to back to vmware 5.5 or hyper-v :smileyshocked:
Hi there,
can you please upload vmkernel.log and vmkwarning.log from /var/log? If you would point out what was the time that the host stopped responding so that we could correlate, we could gain some more insight into the issue
Closing this question and giving up maybe by update 2 or 3 esxi 6 will be good