Overnight, one of my VMs stopped. vCentre was reporting "No more disc space for virtual disc XXXX.vmdk". The datastore where the VM & VMDKs are stored had over 1TB of free disc space.
Googling only shows answers telling me to free disc space on my datastore which, clearly, isn't needed. All my VMs are thick provisioned.
With the VM powered off, I just moved it to another host (leaving all the files on the same datastore) and it powered up fine.
What is going on here?
What is the size of memory this VM is assigned with ? What is the size of swap file its occupying in Powered On state ? I have seen this error mostly when VM is unable to power on or stopped due to no space available for vswp file which depends on allocated memory.
The VM has a 2GB RAM allocation.
I wondered about possible swap file issues so tried powering off some other VMs on the same host, but that didn't improve things. There was barely 100GB of allocated VM RAM on the affected host.
You may need to check up 'vmname.log' file in the VM folder on datastore then. Also check datastore events and alarms.
The best I can find in vmware.log is:
2017-08-31T07:11:06.828Z| vmx| I125: Msg_Question:
2017-08-31T07:11:06.828Z| vmx| I125: [msg.hbacommon.outofspace] There is no more space for virtual disk XXXX-000001.vmdk. You might be able to continue this session by freeing disk space on the relevant volume, and clicking _Retry. Click Cancel to terminate this session.
2017-08-31T07:11:06.828Z| vmx| I125: ----------------------------------------
2017-08-31T07:11:23.828Z| vcpu-0| I125: Tools: Tools heartbeat timeout.
2017-08-31T07:15:05.052Z| vmx| I125: VigorTransportProcessClientPayload: opID=aa9ffa5-c3-c7d2 seq=2862685: Receiving Bootstrap.MessageReply request.
2017-08-31T07:15:05.053Z| vmx| I125: Vigor_MessageRevoke: message 'msg.hbacommon.outofspace' (seq 92281844) is revoked
2017-08-31T07:15:05.053Z| vmx| I125: VigorTransport_ServerSendResponse opID=aa9ffa5-c3-c7d2 seq=2862685: Completed Bootstrap request.
2017-08-31T07:15:05.053Z| vmx| I125: MsgQuestion: msg.hbacommon.outofspace reply=1
2017-08-31T07:15:05.053Z| vmx| I125: Exiting because of failed disk operation.
2017-08-31T07:15:05.053Z| vmx| I125: Backtrace:
2017-08-31T07:15:05.053Z| vmx| I125: Backtrace[0] 000003fff678d450 rip=0000000023b9b9ae rbx=0000000023b9b490 rbp=000003fff678d470 r12=0000000000000000 r13=0000000024683300 r14=000000002403aa3e r15=0000000000000001
2017-08-31T07:15:05.053Z| vmx| I125: Backtrace[1] 000003fff678d480 rip=00000000236344da rbx=000000002486eb08 rbp=000003fff678d970 r12=0000000000000001 r13=0000000024683300 r14=000000002403aa3e r15=0000000000000001
2017-08-31T07:15:05.053Z| vmx| I125: Backtrace[2] 000003fff678d980 rip=00000000237bac2b rbx=00000000323fb1a0 rbp=000003fff678d9c0 r12=00000000325c0c30 r13=0000000024683300 r14=000000002403aa3e r15=0000000000000001
2017-08-31T07:15:05.053Z| vmx| I125: Backtrace[3] 000003fff678d9d0 rip=00000000236f1351 rbx=00000000323fb1a0 rbp=000003fff678d9e0 r12=0000000000000000 r13=000000003227d440 r14=0000000000000001 r15=000003fff678da2c
2017-08-31T07:15:05.053Z| vmx| I125: Backtrace[4] 000003fff678d9f0 rip=00000000236434de rbx=000003fff69bd010 rbp=000003fff678da60 r12=0000000000000000 r13=000000003227d440 r14=0000000000000001 r15=000003fff678da2c
2017-08-31T07:15:05.053Z| vmx| I125: Backtrace[5] 000003fff678da70 rip=000000002364410d rbx=0000000000000000 rbp=000003fff678db10 r12=000008649cd4a417 r13=000003fff69bd010 r14=000000003227d440 r15=0000000032617550
2017-08-31T07:15:05.053Z| vmx| I125: Backtrace[6] 000003fff678db20 rip=0000000023635326 rbx=000000002486eb40 rbp=000003fff678dc80 r12=000000003253ad40 r13=00000000320742d0 r14=000000002486eb08 r15=0000000000000000
2017-08-31T07:15:05.053Z| vmx| I125: Backtrace[7] 000003fff678dc90 rip=0000000023631f36 rbx=0000000000000003 rbp=000003fff678dd10 r12=0000000000000000 r13=000000002402b10d r14=0000000000000000 r15=000000002464c860
2017-08-31T07:15:05.053Z| vmx| I125: Backtrace[8] 000003fff678dd20 rip=0000000025baa8cd rbx=0000000000000000 rbp=0000000000000000 r12=0000000023632558 r13=000003fff678dde8 r14=0000000000000000 r15=0000000000000000
2017-08-31T07:15:05.053Z| vmx| I125: Backtrace[9] 000003fff678dde0 rip=0000000023632581 rbx=0000000000000000 rbp=0000000000000000 r12=0000000023632558 r13=000003fff678dde8 r14=0000000000000000 r15=0000000000000000
2017-08-31T07:15:05.053Z| vmx| I125: SymBacktrace[0] 000003fff678d450 rip=0000000023b9b9ae in function (null) in object /bin/vmx loaded at 0000000023494000
2017-08-31T07:15:05.053Z| vmx| I125: SymBacktrace[1] 000003fff678d480 rip=00000000236344da in function (null) in object /bin/vmx loaded at 0000000023494000
2017-08-31T07:15:05.053Z| vmx| I125: SymBacktrace[2] 000003fff678d980 rip=00000000237bac2b in function (null) in object /bin/vmx loaded at 0000000023494000
2017-08-31T07:15:05.053Z| vmx| I125: SymBacktrace[3] 000003fff678d9d0 rip=00000000236f1351 in function (null) in object /bin/vmx loaded at 0000000023494000
2017-08-31T07:15:05.053Z| vmx| I125: SymBacktrace[4] 000003fff678d9f0 rip=00000000236434de in function (null) in object /bin/vmx loaded at 0000000023494000
2017-08-31T07:15:05.053Z| vmx| I125: SymBacktrace[5] 000003fff678da70 rip=000000002364410d in function (null) in object /bin/vmx loaded at 0000000023494000
2017-08-31T07:15:05.053Z| vmx| I125: SymBacktrace[6] 000003fff678db20 rip=0000000023635326 in function (null) in object /bin/vmx loaded at 0000000023494000
2017-08-31T07:15:05.053Z| vmx| I125: SymBacktrace[7] 000003fff678dc90 rip=0000000023631f36 in function main in object /bin/vmx loaded at 0000000023494000
2017-08-31T07:15:05.053Z| vmx| I125: SymBacktrace[8] 000003fff678dd20 rip=0000000025baa8cd in function __libc_start_main in object /lib64/libc.so.6 loaded at 0000000025b8a000
2017-08-31T07:15:05.053Z| vmx| I125: SymBacktrace[9] 000003fff678dde0 rip=0000000023632581 in function (null) in object /bin/vmx loaded at 0000000023494000
2017-08-31T07:15:05.053Z| vmx| I125: Exiting
Not much information in these logs apart from same disk space error. Can you check space on ESXI console using vdf -h
Tardisk Space Used
sb.v00 140M 140M
s.v00 317M 317M
net_i40e.v00 424K 420K
mtip32xx.v00 244K 241K
ata_pata.v00 40K 37K
ata_pata.v01 28K 26K
ata_pata.v02 32K 28K
ata_pata.v03 32K 29K
ata_pata.v04 36K 33K
ata_pata.v05 32K 30K
ata_pata.v06 28K 26K
ata_pata.v07 32K 30K
block_cc.v00 80K 76K
ehci_ehc.v00 92K 89K
elxnet.v00 456K 452K
emulex_e.v00 24K 22K
weaselin.t00 5M 5M
esx_dvfi.v00 416K 414K
esx_ui.v00 11M 11M
ima_qla4.v00 1M 1M
ipmi_ipm.v00 36K 34K
ipmi_ipm.v01 80K 77K
ipmi_ipm.v02 100K 96K
lpfc.v00 1M 1M
lsi_mr3.v00 272K 272K
lsi_msgp.v00 464K 463K
lsu_hp_h.v00 64K 61K
lsu_lsi_.v00 240K 237K
lsu_lsi_.v01 420K 417K
lsu_lsi_.v02 240K 237K
lsu_lsi_.v03 508K 504K
lsu_lsi_.v04 304K 302K
misc_cni.v00 24K 20K
misc_dri.v00 5M 5M
net_bnx2.v00 280K 276K
net_bnx2.v01 1M 1M
net_cnic.v00 144K 142K
net_e100.v00 308K 305K
net_e100.v01 344K 342K
net_enic.v00 140K 139K
net_forc.v00 120K 117K
net_igb.v00 316K 312K
net_ixgb.v00 400K 397K
net_mlx4.v00 340K 337K
net_mlx4.v01 228K 227K
net_nx_n.v00 1M 1M
net_tg3.v00 304K 303K
net_vmxn.v00 100K 99K
nmlx4_co.v00 576K 575K
nmlx4_en.v00 420K 418K
nmlx4_rd.v00 172K 171K
nvme.v00 172K 171K
ohci_usb.v00 60K 57K
qlnative.v00 2M 2M
rste.v00 796K 794K
sata_ahc.v00 80K 79K
sata_ata.v00 52K 51K
sata_sat.v00 60K 59K
sata_sat.v01 40K 38K
sata_sat.v02 40K 39K
sata_sat.v03 32K 30K
sata_sat.v04 28K 27K
scsi_aac.v00 172K 169K
scsi_adp.v00 428K 425K
scsi_aic.v00 284K 282K
scsi_bnx.v00 272K 268K
scsi_bnx.v01 200K 196K
scsi_fni.v00 228K 226K
scsi_hps.v00 172K 169K
scsi_ips.v00 100K 98K
scsi_meg.v00 92K 91K
scsi_meg.v01 168K 166K
scsi_meg.v02 88K 87K
scsi_mpt.v00 448K 445K
scsi_mpt.v01 492K 489K
scsi_mpt.v02 420K 416K
scsi_qla.v00 272K 271K
uhci_usb.v00 60K 57K
vmware_f.v00 47M 47M
vsan.v00 23M 23M
vsanheal.v00 2M 2M
vsanmgmt.v00 6M 6M
xhci_xhc.v00 228K 226K
xorg.v00 3M 3M
imgdb.tgz 452K 450K
state.tgz 32K 30K
onetime.tgz 60K 58K
-----
Ramdisk Size Used Available Use% Mounted on
root 32M 240K 31M 0% --
etc 28M 296K 27M 1% --
opt 32M 0B 32M 0% --
var 48M 492K 47M 1% --
tmp 256M 4K 255M 0% --
iofilters 32M 0B 32M 0% --
hostdstats 1553M 7M 1545M 0% --
And this looks similar on other hosts in the cluster.
what about df -h , correlate mounted volume where VM was residing before migration.
Ahhh....
Three nodes have the correct size and usage, whereas two nodes don't. I've had this before where the hosts have not agreed on a datastore size. What was even worse, was that the vCentre tools couldn't fix it. You have to use the Windows client to talk direct to the ESXi host to get it to do a rescan.
Anyway, problem now solved.
Thanks,
Just out of curiosity, are there any RDM LUNs attached to these hosts? I've seen this sort of thing happen if the RDM LUNs aren't set to Perennially Reserved = True.