VMware Cloud Community
Sup3rFly
Contributor
Contributor

sfcbd won't start

As I was troubleshooting why Dell OpenManage 71. wasn't working on my ESXi 5.1 Essentials box, I did a services.sh restart.

Now I am getting these errors every couple of minutes:

Component sfcb-ProviderMa not running, Restarting sfcbd.

Component sfcb-HTTP-Daemo not running, Restarting sfcbd.

Component sfcb-HTTPS-Daem not running, Restarting sfcbd.

I tried to restart sfcbd-watchdog, here is what happens:

/etc/init.d # ./sfcbd-watchdog start

/etc/init.d # ps: no non-option arguments
ps
              -C   Display only cartels
              -P   Display PCID
              -T   Display used time
              -c   Display verbose command line
              -g   Display session ID and process group
              -j   Display GID
              -s   Display state
              -t   Display type
              -u   Display only userworlds
              -v   Display non truncated values
              -Z   Display the security domain
I don't want to restart this host if possible.  How can I fix this mess?  Any help is greatly appreciated!!!
0 Kudos
6 Replies
Sup3rFly
Contributor
Contributor

More info.

/var/run/sfcb # /sbin/sfcbd
--- Log syslog level: 3
--- sfcbd V1.3.7 started - 1435788
--- Using /etc/sfcb/sfcb.cfg
Using maxSemInitRetries : 5
--- Successfully initialized semaphore, semid : 157581312
--- External HTTP connections disabled; using loopback only
--- SFCB semaphore create key: 0x530105d8 failed: No space left on device
     use "ipcrm -S 0x530105d8" to remove semaphore

I tried the ipcrm command, but get an ipcrm not found.

How do I found out which space is full?  df -h isn't showing me it.

0 Kudos
MKguy
Virtuoso
Virtuoso

No space left on device
      use "ipcrm -S 0x530105d8" to remove semaphore

Check if your host has enough free inodes available on the filesystem. The sfcb agent is a huge inodes hog with loads of concurrently open files.

See:

http://kb.vmware.com/kb/2037798

http://kb.vmware.com/kb/1008643

There should also be more detailed logs in /var/log (or /scratch/log for that matter) if you can spot anything there.

-- http://alpacapowered.wordpress.com
0 Kudos
Sup3rFly
Contributor
Contributor

inodes look good:

/etc/init.d # stat -f /
  File: "/"
    ID: 100000000 Namelen: 127     Type: visorfs
Block size: 4096
Blocks: Total: 215841     Free: 108611     Available: 108611
Inodes: Total: 524288     Free: 518992

Deleted a bunch of .dmp files in /var/

Still can't get sfcb-watchdog start to work, gives me the ps error. 

How strange.

0 Kudos
Sup3rFly
Contributor
Contributor

I am seeing this a lot in the vmkernel.log:

2012-12-10T20:24:55.785Z cpu2:516554)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x4124007dd740, 0) to dev "naa.500188b9850003a3" on path "vmhba2:C0:T17:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2012-12-10T20:24:55.785Z cpu2:516554)ScsiDeviceIO: 2316: Cmd(0x4124007dd740) 0x1a, CmdSN 0x19282 from world 0 to dev "naa.500188b9850003a3" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
2012-12-10T20:24:55.788Z cpu2:2050)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x4124007dd740, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2012-12-10T20:24:55.788Z cpu2:2050)ScsiDeviceIO: 2316: Cmd(0x4124007dd740) 0x1a, CmdSN 0x19283 from world 0 to dev "mpx.vmhba0:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
2012-12-10T20:24:55.816Z cpu2:2050)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x4124007dd740, 0) to dev "t10.DP______BACKPLANE000000" on path "vmhba1:C0:T8:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2012-12-10T20:24:55.817Z cpu2:2050)ScsiDeviceIO: 2316: Cmd(0x4124007dd740) 0x1a, CmdSN 0x19284 from world 0 to dev "t10.DP______BACKPLANE000000" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
2012-12-10T20:25:22.476Z cpu2:625356)User: 2742: wantCoreDump : sfcb-sfcb -enabled : 0
2012-12-10T20:25:40.753Z cpu3:893812)User: 2742: wantCoreDump : sfcb-sfcb -enabled : 0
2012-12-10T20:26:23.554Z cpu3:1440851)User: 2742: wantCoreDump : sfcb-sfcb -enabled : 0

Is this complaining of a hard drive?

0 Kudos
MKguy
Virtuoso
Virtuoso

I don't think these messages are really related to you issue. Can you check the /scratch/log/syslog.log file?

Also run vdf- h on the shell, it might be related to the ESXi ramdisk which doesn't show up with df -h:

# vdf -h
Ramdisk                   Size      Used Available Use% Mounted on
root                       32M        7M       24M  22% --
etc                        28M      288K       27M   1% --
tmp                       192M        4K      191M   0% --
hostdstats                223M        2M      220M   1% --

Have you tried rebooting the host already?

-- http://alpacapowered.wordpress.com
0 Kudos
Sup3rFly
Contributor
Contributor

I was able to reboot the host last night.  Everything came back up and working like normal. 

I will keep checking OpenManage and see if I can find why it keeps failing.

Thanks.

0 Kudos