VMware Cloud Community
capekvl
Contributor
Contributor

ESXi 4.0 on INTEL DQ45CB Cube Cove, iQ45/ICH10, Adaptec Raid 2405 every 24 hours freezes

Hi,

I have server with this configuration:

INTEL DQ45CB Cube Cove, iQ45/ICH10 last Bios Version

Adaptec Raid 2405 Raid1(mirror) with two sata hdd ST3500320NS

INTEL PRO/1000 PT Server Adapter

Adaptec Raid 2405

Esxi 4.0

========================================================

One virtual machine SBS2003

========================================================

Isuse:

40 days server work fine

now every day server hang (ESXI console run, but vmware is freezes, i cannot reboot) I must Server Hard Power Down by button.

On the syslog i have this issues but cannot find finaly solution:

2009-11-30 12:20:21 Local4.Info 10.10.10.21 Nov 30 11:20:06 Hostd: 2009-11-30 11:20:06.416 665A4D90 verbose 'Vmsvc' RefreshVms updated overhead for 1 VM

2009-11-30 12:20:49 Local6.Notice 10.10.10.21 Nov 30 11:20:34 vmkernel: 1:03:14:02.547 cpu1:4168)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x4100041300c0) to NMP device "mpx.vmhba2:C0:T0:L0" failed on physical path "vmhba2:C0:T0:L0" H:0x5 D:0x0 P:0x0 Possible sense data: 0x2 0x3a 0x1.

2009-11-30 12:24:09 Local6.Warning 10.10.10.21 Nov 30 11:20:34 vmkernel: 1:03:14:02.547 cpu1:4168)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "mpx.vmhba2:C0:T0:L0" state in doubt; requested fast path state update...

2009-11-30 12:27:29 Local6.Notice 10.10.10.21 Nov 30 11:20:34 vmkernel: 1:03:14:02.547 cpu1:4168)ScsiDeviceIO: 747: Command 0x2a to device "mpx.vmhba2:C0:T0:L0" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x2 0x3a 0x1.

2009-11-30 12:30:49 Local6.Notice 10.10.10.21 Nov 30 11:20:39 vmkernel: 1:03:14:07.619 cpu1:4168)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x4100041300c0) to NMP device "mpx.vmhba2:C0:T0:L0" failed on physical path "vmhba2:C0:T0:L0" H:0x5 D:0x0 P:0x0 Possible sense data: 0x2 0x3a 0x1.

2009-11-30 12:37:13 User.Info 10.10.10.21 Nov 30 11:36:30 vmklogger: Successfully daemonized.

=====================================================

After HW restart server goes for 24 hours.

=====================================================

Any idea what is wrong? Of you look to the ESXi.jpg (there is notinfo about Vmotion Enabled and Hyperthreading (i setit as enabled on BIOS)). Why?

Why server goes fine for 40 days and now every 24 hours hung?

It is possible that hdd has bad blocks? How can i monitor it under esxi?

Thanx a lot for all

Vlladimir (Czech Rep. )

Reply
0 Kudos
4 Replies
J1mbo
Virtuoso
Virtuoso

First, find out which device those messages are relating to. Go to Configuration, Storage Adapters, and they are listed by vmhba index.

Presuming though that vmhba2 is your array controller, then that is the first place to look. Does it have a system log accessible through it's BIOS perhaps?

If its status is not being reported in ESX, I would suggest replacing it with a controller that will, for example an LSI based card (like the Dell Perc 5i).

If you can schedule some downtime I would also run a memory test on the box, for example Microsoft Memory Diagnostic (on the Windows Vista and 7 installation CDs, select recovery and it's in the tools bit).

HTH

Reply
0 Kudos
capekvl
Contributor
Contributor

Hi,

2.12 .2009

I made memory tests - all was correct

I reinstal ESX 4.0 to 4.0 U1 and now I wait whats happend..... (4:30AM)

On the picture is the view of storage adapter, all is correct. (Adaptec Raid has latests BIOS)

3.12.2009

(during the server backups 4 AM) are on the ESXI monitor log this erros:

SCSILinuxAbortCommands Failed, Driver AAC, for vmhba2

Any Ideas? (I read something about Batery Memmory for Adaptec and also about write caches on Raids) any infomrations?

Thanx a lot

Vladimir

Reply
0 Kudos
J1mbo
Virtuoso
Virtuoso

Does the controller have a battery connected?

Please award points to any useful answer.

capekvl
Contributor
Contributor

Sorry i made fail on my english, now it is correct

Before: (I wrote something about Batery Memmory for Adaptec and also about write caches on Raids) any infomrations?

Now : (I read something about Batery Memmory for Adaptec and also about write caches on Raids) any infomrations?

I have one idea more: Could be this kind of errors connected to bad blocks or another failures on hdd? Or ESXi has another Log errors for HDD failures?

Thanx a lot.

Vladimir

Reply
0 Kudos