Gary_Williams
Enthusiast
Enthusiast

VCSA services not starting

Jump to solution

I've got VCentre 6.5 in the lab and after a power cut, I went to restart it and it's starting but services are failing.

root@vcentre65 [/]# service-control --start --all

Service-control failed. Error Failed to start vmon services.vmon-cli RC=1, stderr=Failed to start vmware-vpostgres, vapi-endpoint services. Error: Operation timed out

I've checked for any lock files and it all looks fine, the appliance has plenty of disk space so I'm not sure what's causing it. Starting the services manually doesn't help and the errors I'm getting are not exactly helpful......

root@vcentre65 [ / ]# service-control --start vmware-vpostgres

Perform start operation. vmon_profile=None, svc_names=['vmware-vpostgres'], include_coreossvcs=False, include_leafossvcs=False

2017-12-12T22:46:37.009Z Service vmware-vpostgres state STOPPED

Error executing start on service vmware-vpostgres. Details {

  "resolution": null,

  "detail": [

  {

  "args": [

  "vmware-vpostgres"

  ],

  "id": "install.ciscommon.service.failstart",

  "localized": "An error occurred while starting service 'vmware-vpostgres'",

  "translatable": "An error occurred while starting service '%(0)s'"

  }

  ],

  "componentKey": null,

  "problemId": null

}

Service-control failed. Error {

  "resolution": null,

  "detail": [

  {

  "args": [

  "vmware-vpostgres"

  ],

  "id": "install.ciscommon.service.failstart",

  "localized": "An error occurred while starting service 'vmware-vpostgres'",

  "translatable": "An error occurred while starting service '%(0)s'"

  }

  ],

  "componentKey": null,

  "problemId": null

}

So... Any suggestions as to what might be causing VCSA 6.5 not play ball?!

Thanks.

1 Solution

Accepted Solutions
daphnissov
Immortal
Immortal

Good deal, let me know what you find out (for everyone's benefit on this thread). I did some experimenting in my lab, and it looks like you'll want to check out pg_resetxlog command to see if the corruption needs to be overridden or not.

View solution in original post

12 Replies
daphnissov
Immortal
Immortal

First thing is check logs for vPostgres and see why the daemon isn't starting.

0 Kudos
Gary_Williams
Enthusiast
Enthusiast

Sorry, I should have mentioned that, there isn't anything in the vpostgres log.

root@vcentre65 [ /tmp ]# more /storage/db/vpostgres/pg_xlog

*** /storage/db/vpostgres/pg_xlog: directory ***

root@vcentre65 [ /tmp ]#

0 Kudos
Jitu211003
Hot Shot
Hot Shot

Hi,

Try to check the space on root partition. See if its full.

df -h

You need to extend the root size if it is full. Follow the article below. I did face same issue in VCSA 6.0.

VCSA 6.0 failed to stage for patch update and failed to connect after reboot also - VMware Diary

0 Kudos
Gary_Williams
Enthusiast
Enthusiast

It's not. Disk space was one of the first things I checked as I hit space issues with VCSA 5.5

root@vcentre65 [ ~ ]# df -h

Filesystem Size Used Avail Use% Mounted on

devtmpfs 4.9G 0 4.9G 0% /dev

tmpfs 4.9G 0 4.9G 0% /dev/shm

tmpfs 4.9G 696K 4.9G 1% /run

tmpfs 4.9G 0 4.9G 0% /sys/fs/cgroup

/dev/sda3 11G 5.3G 4.8G 53% /

tmpfs 4.9G 912K 4.9G 1% /tmp

/dev/mapper/netdump_vg-netdump 985M 1.3M 932M 1% /storage/netdump

/dev/sda1 120M 27M 87M 24% /boot

/dev/mapper/imagebuilder_vg-imagebuilder 9.8G 23M 9.2G 1% /storage/imagebuilder

/dev/mapper/dblog_vg-dblog 15G 102M 14G 1% /storage/dblog

/dev/mapper/db_vg-db 9.8G 165M 9.1G 2% /storage/db

/dev/mapper/autodeploy_vg-autodeploy 9.8G 23M 9.2G 1% /storage/autodeploy

/dev/mapper/updatemgr_vg-updatemgr 99G 731M 93G 1% /storage/updatemgr

/dev/mapper/seat_vg-seat 9.8G 198M 9.1G 3% /storage/seat

/dev/mapper/core_vg-core 25G 5.9G 18G 26% /storage/core

/dev/mapper/log_vg-log 9.8G 1.8G 7.5G 20% /storage/log

0 Kudos
RajeevVCP4
Expert
Expert

Do the clean power of ( not restart) then power on.

if PSC is external then power off VC first then PSC and power on in reverse order.

Rajeev Chauhan
VCIX-DCV6.5/VSAN/VXRAIL
Please mark help full or correct if my answer is use full for you
0 Kudos
daphnissov
Immortal
Immortal

If vPostgres isn't starting, a shutdown and restart isn't going to do anything.

@OP, check in /var/log/vmware/vpostgres/postgresql-##.log that corresponds to the latest date stamp. Also check serverlog.stderr. Anything in there?

Gary_Williams
Enthusiast
Enthusiast

A full shutdown and restart didn't fix anything.

serverlog.stderr shows nothing

root@vcentre65 [ ~ ]# tail /storage/log/vmware/vpostgres/serverlog.stderr

Starting service process with pid: 57064.

LOG: skipping missing configuration file "/storage/db/vpostgres/postgresql.conf.repl"

LOG: skipping missing configuration file "/storage/db/vpostgres/postgresql.conf.repl"

2017-12-13 22:19:40.594 UTC 5a31a77c.dee8 0 LOG: registering background worker "health_status_worker"

2017-12-13 22:19:40.765 UTC 5a31a77c.dee8 0 LOG: redirecting log output to logging collector process

2017-12-13 22:19:40.765 UTC 5a31a77c.dee8 0 HINT: Future log output will appear in directory "/var/log/vmware/vpostgres".

Now, there is nothing in /var/log/vmware/vpostgres/, however a search for vpostgres logs shows something interesting:

root@vcentre65 [ /storage/log/vmware/vpostgres ]# pwd

/storage/log/vmware/vpostgres

root@vcentre65 [ /storage/log/vmware/vpostgres ]# tail 25 postgresql-13.log

tail: cannot open '25' for reading: No such file or directory

==> postgresql-13.log <==

2017-12-13 15:12:34.800 UTC 5a314362.3a24 0 LOG: invalid secondary checkpoint record

2017-12-13 15:12:34.800 UTC 5a314362.3a24 0 PANIC: could not locate a valid checkpoint record

2017-12-13 15:12:38.466 UTC 5a314362.3a22 0 LOG: startup process (PID 14884) was terminated by signal 6: Aborted

2017-12-13 15:12:38.467 UTC 5a314362.3a22 0 LOG: aborting startup due to startup process failure

2017-12-13 22:19:40.768 UTC 5a31a77c.deec 0 LOG: database system was interrupted; last known up at 2017-12-11 10:54:22 UTC

2017-12-13 22:19:41.142 UTC 5a31a77c.deec 0 LOG: invalid primary checkpoint record

2017-12-13 22:19:41.142 UTC 5a31a77c.deec 0 LOG: invalid secondary checkpoint record

2017-12-13 22:19:41.142 UTC 5a31a77c.deec 0 PANIC: could not locate a valid checkpoint record

2017-12-13 22:19:44.926 UTC 5a31a77c.dee8 0 LOG: startup process (PID 57068) was terminated by signal 6: Aborted

2017-12-13 22:19:44.926 UTC 5a31a77c.dee8 0 LOG: aborting startup due to startup process failure

0 Kudos
Gary_Williams
Enthusiast
Enthusiast

I've done some more digging and it looks like corruption in the database, I'll try a few thinks and report back, thanks daphnissov​, you've pointed me in the right direction - much appreciated!

0 Kudos
daphnissov
Immortal
Immortal

Good deal, let me know what you find out (for everyone's benefit on this thread). I did some experimenting in my lab, and it looks like you'll want to check out pg_resetxlog command to see if the corruption needs to be overridden or not.

Gary_Williams
Enthusiast
Enthusiast

You're spot on, I had to do some messing around with /etc/passwd as I couldn't su to vpostgres because it was set to nologin.

I eventually ran pg_resetxlog and that brought the DB up but now vpxd is complaining:

2017-12-14T14:01:08.663Z warning vpxd[7F99A77C9800] [Originator@6876 sub=InvtVmDb] Failed to load VPX_VM record from DB. Host id: '40' is not found in the inventory for VM id: '49'

2017-12-14T14:01:08.663Z warning vpxd[7F99A77C9800] [Originator@6876 sub=InvtVmDb] Failed to load VPX_VM record from DB. Host id: '40' is not found in the inventory for VM id: '53'

2017-12-14T14:01:08.664Z warning vpxd[7F99A77C9800] [Originator@6876 sub=InvtVmDb] Failed to load VPX_VM record from DB. Host id: '56' is not found in the inventory for VM id: '75'

2017-12-14T14:01:08.664Z warning vpxd[7F99A77C9800] [Originator@6876 sub=InvtVmDb] Failed to load VPX_VM record from DB. Host id: '56' is not found in the inventory for VM id: '73'

2017-12-14T14:01:08.664Z warning vpxd[7F99A77C9800] [Originator@6876 sub=InvtVmDb] Failed to load VPX_VM record from DB. Host id: '523' is not found in the inventory for VM id: '263'

I think that the database is toast so I'm going to restore it from backup, it's been an interesting lesson Smiley Happy

0 Kudos
daphnissov
Immortal
Immortal

Argh, if it can't load those records from the table they're probably hosed, unfortunately. But a DB restore should be good to try.

Gary_Williams
Enthusiast
Enthusiast

Yeah, as soon as I saw those errors I thought that Vcentre was probably a mess. I'll see if I can restore the database and if that doesn't fix it I'll just restore the whole VM. Not a big deal but an interesting experiment Smiley Happy

0 Kudos