slciec's Posts

VMWare got back to me and told me this is a cosmetic issue and the numbers I am seeing are wrong.   So in my case this is probably true since I don't see any issues when running load test on th... See more...
VMWare got back to me and told me this is a cosmetic issue and the numbers I am seeing are wrong.   So in my case this is probably true since I don't see any issues when running load test on the virtual machine.  
Dell got back to me and basically told me the thing we already knew. "The hardware is performing as expected. While in the Support Live Image, all the drives observes extremely low latency times on ... See more...
Dell got back to me and basically told me the thing we already knew. "The hardware is performing as expected. While in the Support Live Image, all the drives observes extremely low latency times on the tests you performed, and the overall performance was very good and pretty consistent. All of this does point toward ESXi/VMware being the bottleneck, unfortunately." I updated my ticket with Vmware but if I don't hear back from them I am unsure what to do next.  
I increased the jobs to 16 this is what i got back. I also ran this to measure random read/write performance. fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filen... See more...
I increased the jobs to 16 this is what i got back. I also ran this to measure random read/write performance. fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=sbd --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75 Results   The numbers look fine to me unless I am reading something wrong. Also I would note this is not a test done with esxi installed, I am using a dell image that is running Rocky Linux 8.8 with all the appropriate drivers and test utilites on it. After running all these test I get the feeling this is not specifically hardware related as much as it is esx related with drivers or something.  
This is on a RAID5 with 7 drives. Block size 512k   Block size 1024k  
128k, this is the configuration i used with the test. [global] rw=write numjobs=1 iodepth=128 ioengine=libaio time_based runtime=600 bs=128k direct=1
Each server has a dual H965i, but drives are only connected to one of the two raid cards. Once dell support gets back to me I might try doing that.
So are you thinking it has something to do with the mainboard? I finally was connected to a couple Dell senior support engineers and they had me download a SLI ISO to do testing of the setup using f... See more...
So are you thinking it has something to do with the mainboard? I finally was connected to a couple Dell senior support engineers and they had me download a SLI ISO to do testing of the setup using fio and iostat. I broke the raid and made all the NVME drives as non-raid, then I recreated the RAID and ran the test. Everything has been sent to Dell this morning so I will have to wait and see what they say. Also I have told them to read thru this thread to see what everyone else is saying. I hope to get a solution at some point from dell or vmware, cause I have bunch of new equipment that are just bricks right now.
Yes. Both are set. I just received a message from my rep at dell and said they are actively working on the issue.
In my situation it happens on the BOSS Card in a RAID 1 and the NVMe RAID 5. My personal thought is it is the NVMe part of the setup that is causing the issue but I cannot confirm this until Dell re... See more...
In my situation it happens on the BOSS Card in a RAID 1 and the NVMe RAID 5. My personal thought is it is the NVMe part of the setup that is causing the issue but I cannot confirm this until Dell replaces my configuration with a SAS SSD setup or something else. Dell is currently working on something with it, i will let you know what i find out.  
Thanks. I downgraded and it still does the same thing with latency. I did get this from vmware on my specific issue. The Engineering team have shared the update that: “We have been actively debu... See more...
Thanks. I downgraded and it still does the same thing with latency. I did get this from vmware on my specific issue. The Engineering team have shared the update that: “We have been actively debugging this issue, but looks to be a tricky one. We have added debug logs from where the stats are fetched, but we see no anomalies there, yet esxtop reports high and negative stats sometimes. We have not yet root caused the issue, debug is still in progress.” I will keep you updated with the progress.
The only file format is shows for older firmware is a BIN file, I have not done an update using a BIN file before.    
Thanks. I will communicate this to dell. It won't let me downgrade the firmware with the packages available.....  
Any fix on your end? I have contacted Dell to see about replacement my NVMe with SAS SSD setup. I will let you know what they say.
Have you opened a case with Dell or VMware? I would be interested in seeing what they tell you about it.
8.0U1 A04 was released also. I updated one of my servers to this version and my issue still persists.  
Also, not sure of your configuration but I also had this issue with my R760's and error messages. But it was easily solved. Re: Failed to cleanup registration key on volume - VMware Technology Netw... See more...
Also, not sure of your configuration but I also had this issue with my R760's and error messages. But it was easily solved. Re: Failed to cleanup registration key on volume - VMware Technology Network VMTN
This is the last email I received from tech support. I am going to respond letting them know another customer is having the same issue. It looks like it's an intermittent issue on 8.0.1 and wasn't s... See more...
This is the last email I received from tech support. I am going to respond letting them know another customer is having the same issue. It looks like it's an intermittent issue on 8.0.1 and wasn't seen on the latest main but since it was still seen in 8.0.1 We now trying to root cause issue on 8.0.1. As per the current investigation following are the obeservations. 1.) These values go out of range on 8.0.1 and not on main (we are yet to confirm this with adequate experimentation) 2.) It is seen on large IO sizes usually around 4M or higher. 3.) On lower-size IOs we don't see this issue and values are just fine in that case. 4.) This issue seems to occur only in nvme case not in scsi devices. With that said we are still investigating the root cause of the issue.
This issue has been fixed on my Dell R760 with BOSS-N1 Monolithic with RAID1 NVMe drives. So the issue was when ESXi gets installed on a drive greater than 142GB it will automatically create a VMFS ... See more...
This issue has been fixed on my Dell R760 with BOSS-N1 Monolithic with RAID1 NVMe drives. So the issue was when ESXi gets installed on a drive greater than 142GB it will automatically create a VMFS datastore. ESXi System Storage Changes | VMware Once I deleted the VMFS volume on the BOSS storage the error stopped. Since my servers are not in production I worked on a way to install ESX without creating the VMFS volume.  I setup a simple kickstart configuration (ks.cfg) to stop the creation of the VMFS. vmaccepteula install --firstdisk=NVMe,local --novmfsondisk rootpw myp@ssw0rd reboot Hope this helps someone else if they have the same problem.
Reboot does not fix the issue. All HBA drivers and firmware has been already applied. This was done when I opened a ticket with Dell Support. I installed multiple versions of ESXI, I even went back... See more...
Reboot does not fix the issue. All HBA drivers and firmware has been already applied. This was done when I opened a ticket with Dell Support. I installed multiple versions of ESXI, I even went back to version 7.0U3n which is in the HCL Matrix for the server as being supported. On all versions it happened. I was just updating the community in case anyone else has this issue. If I get anymore information i will pass it along. Thanks for responding.
This is what support has told me about this. "The abnormal latency reported in the esxtop values is been investigated on by the Engineering team. I believe this behavior to be a cosmetic one, as th... See more...
This is what support has told me about this. "The abnormal latency reported in the esxtop values is been investigated on by the Engineering team. I believe this behavior to be a cosmetic one, as there are no other issues reported on the host, however we would not be able to confirm the same until we have an confirmation and further action plan provided by the Engineering team."