Hi,
We have 60 nfs datastore and couple of vms have over 20TB,60TB vmdk on the nfs datastore in our platform, sometimes some other vms cannot boot or restart on some esxi host, Vmware and vmkernel logs are clearly state that, the virtual machine cannot access to its nfs harddisk, because of the esxi host hit te max heap size.
Aalso virtual machine nfs harddisk shows 0 byte under vcenter summary tab, and from the esx cli you see 0 files listed inside any nfs datastore. Remounting nfs datastore return "vm file system cannot open volume"
Here is what is dumping in the vmware.log while push the start button and got the error message on vcenter.
2020-05-04T16:15:28.910Z| vmx| I125+ Power on failure messages: File system specific implementation of GetObject[fs] failed
2020-05-04T16:15:28.910Z| vmx| I125+ Heap nfsclient already at its maximum size of 295973752. Cannot expand.
2020-05-04T16:15:28.910Z| vmx| I125+ File system specific implementation of GetObject[fs] failed
2020-05-04T16:15:28.910Z| vmx| I125+ Heap nfsclient already at its maximum size of 295973752. Cannot expand.
2020-05-04T16:15:28.910Z| vmx| I125+ Heap nfsclient already at its maximum size of 295973752. Cannot expand.
2020-05-04T16:15:28.910Z| vmx| I125+ File system specific implementation of GetObject[fs] failed
2020-05-04T16:15:28.910Z| vmx| I125+ Heap nfsclient already at its maximum size of 295973752. Cannot expand.
2020-05-04T16:15:28.910Z| vmx| I125+ File system specific implementation of GetObject[fs] failed
2020-05-04T16:15:28.910Z| vmx| I125+ Heap nfsclient already at its maximum size of 295973752. Cannot expand.
2020-05-04T16:15:28.910Z| vmx| I125+ Heap nfsclient already at its maximum size of 295973752. Cannot expand.
2020-05-04T16:15:28.910Z| vmx| I125+ File system specific implementation of GetObject[fs] failed
2020-05-04T16:15:28.910Z| vmx| I125+ Heap nfsclient already at its maximum size of 295973752. Cannot expand.
2020-05-04T16:15:28.910Z| vmx| I125+ File system specific implementation of GetObject[fs] failed
2020-05-04T16:15:28.910Z| vmx| I125+ Heap nfsclient already at its maximum size of 295973752. Cannot expand.
2020-05-04T16:15:28.910Z| vmx| I125+ Heap nfsclient already at its maximum size of 295973752. Cannot expand.
2020-05-04T16:15:28.910Z| vmx| I125+ File system specific implementation of GetObject[fs] failed
2020-05-04T16:15:28.910Z| vmx| I125+ Heap nfsclient already at its maximum size of 295973752. Cannot expand.
2020-05-04T16:15:28.910Z| vmx| I125+ File system specific implementation of GetObject[fs] failed
2020-05-04T16:15:28.910Z| vmx| I125+ Heap nfsclient already at its maximum size of 295973752. Cannot expand.
2020-05-04T16:15:28.910Z| vmx| I125+ Heap nfsclient already at its maximum size of 295973752. Cannot expand.
2020-05-04T16:15:28.910Z| vmx| I125+ File system specific implementation of GetObject[fs] failed
2020-05-04T16:15:28.910Z| vmx| I125+ Heap nfsclient already at its maximum size of 295973752. Cannot expand.
2020-05-04T16:15:28.910Z| vmx| I125+ File system specific implementation of GetObject[fs] failed
2020-05-04T16:15:28.910Z| vmx| I125+ Heap nfsclient already at its maximum size of 295973752. Cannot expand.
2020-05-04T16:15:28.910Z| vmx| I125+ Heap nfsclient already at its maximum size of 295973752. Cannot expand.
2020-05-04T16:15:28.910Z| vmx| I125+ Cannot allocate memory
2020-05-04T16:15:28.910Z| vmx| I125+ Cannot open the disk '/vmfs/volumes/6c10c2fe-964d5016/vmname-abc/vmname-abc.vmdk' or one of the snapshot disks it depends on.
2020-05-04T16:15:28.910Z| vmx| I125+ Module 'Disk' power on failed.
2020-05-04T16:15:28.910Z| vmx| I125+ Failed to start the virtual machine.
We already made sure it is not something relate one of these;
Nothing related to this 'one of the snapshot disks it depends on...'
Changing some advanced values need reboot, like tcpheap, sunrpc etc.
Restarting nfs service, or full service restart doesnt help.
Seperate vmkernel adapter? not sure more likely it depends to nfs service.
We have not noticed performance issue in our nfs storage device, at least now.
Thanks a lot for any advice.
Please check if VMware Knowledge Base is helpful.
Thanks you for your reply Shreyska, I did this before, increased 3 options according to this document which i have mentioned in the question, didnt help, those values for increasing performance only when vm accessing to nfs datastore, also some of them need reboot.
I think my problem more related to affordable vmdk size by host. VMware ESXi, 6.5.0, 7388607 can process max 65 TB vmdk file blocks by using below max heap size even if you allocate 100tb to total vms. I eliminated this too by letting host to enough room.
As i said restarting host resolve the issue, there should be a way to freed heap size, once esxi host hits the max value than it cannot go back, some how it keeps this peak value somewhere in the cache, there should be a refresh mechanism.
This may help: VMware Knowledge Base
Last week we have changed below values and reboot the host. Than heap increased from 28... to 29..... But we can not reboot the host right now. Why esxi doesnt release heap memory or dont go back to initial sizes, even when there are enough room (total vmdk files not more than 62 TB).
SETTING OLD NEW
------ ------ ----
Net.TcpipHeapSize 0 32
Net.TcpipHeapMax 512 1024
SunRPC.MaxConnPerIP 4 8
Little improvement, Our system engineer tried to remount an empty nfs datastore on the same host, and than he was be able to access the disk which is inside nfs datastore, but after sometime problem come back when you start making IOPs. The strange thing after he did that is vcenter shows correct size of the vmdk but still disk isnot accessible.
We are sure we have to upgrade the system, rebooting host will fix it, but right now we have to work around with freeing nfsclient heap memory without reboot, because problem might happen again even you reboot, you know how difficult upgrade procidure is anyway.
Any help, appreciated?
Contact VMware Global Support Services?
VMware - How to File a Support Request Online