windwalker78
Contributor
Contributor

EPYC 7702 and High I/O leads to VM filesystem crash with IOMMU(ro)

   Hello,

   We have a dual 7702 CPU supermicro server with ESXi 7.0U3 and single debian 11 EFI VM with 128 cores as single socket and IOMMU.

The BIOS of the physical server is configured as per:

https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/performance/vsphere70u2-...

and

https://developer.amd.com/wp-content/resources/56779_1.0.pdf

CPU configuration:

1. 2xAPIC is enabled

2. AMD SMT is enabled

 

MEMORY:

For Trhoughput: Maximum Memory Bus Frequency 3200MT/s

 

ACPI/NUMA:

NPS: Numa per socket = 1

NUMA nodes per socket: DISPABLED L3 cache as NUMA turned off (CCX-as-NUMA disabled)

 

NB Configuration/Other:

IOMMU is ON

 

PCIe/PCI/PnP configuration:

SR-IOV support enabled

 

The VM, has these suggested in the PDFs settings:

IOMMU=Checked

Numa.LocalityWeightActionAffinity=0

Numa.PreferHT=1

 

The VM passed the CPU stress test successfully.

The I/O tests failed for this VM, when placed on datastores on two different controllers:

NVME and Dell PercH730 (RAID0 for testing with WriteThrough and No Read Ahead (tested with Writeback and Read ahead no difference))

For tests we use 6 simultaneous sysbench test, three of them reading from 20k 1.5MB files sequentially and three of them writing sequentially, each to one 30GB file.

After a couple of minutes the VM filesystems lock and become read only.
The VM complains with:

 

 

 

[ 1438.420318] blk_update_request: I/O error, dev sda, sector 5788792 op 0x1:(WRITE) flags 0x0 phys_seg 113 prio class 0
[ 1438.421746] sd 0:0:0:0: [sda] tag#371 Unknown completion status: 0x1a
[ 1438.421755] sd 0:0:0:0: [sda] tag#370 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
[ 1438.421762] sd 0:0:0:0: [sda] tag#372 Unknown completion status: 0x1a
[ 1438.421774] sd 0:0:0:0: [sda] tag#370 CDB: Write(10) 2a 00 00 58 50 68 00 04 10 00
[ 1438.421782] blk_update_request: I/O error, dev sda, sector 5787752 op 0x1:(WRITE) flags 0x4000 phys_seg 128 prio class 0
[ 1438.421838] sd 0:0:0:0: [sda] tag#373 Unknown completion status: 0x1a
[ 1438.423444] sd 0:0:0:0: [sda] tag#374 Unknown completion status: 0x1a
[ 1438.423465] sd 0:0:0:0: [sda] tag#375 Unknown completion status: 0x1a
[ 1438.423557] EXT4-fs warning: 18 callbacks suppressed
[ 1438.423567] sd 0:0:0:0: [sda] tag#376 Unknown completion status: 0x1a
[ 1438.423573] EXT4-fs warning (device dm-0): ext4_end_bio:345: I/O error 10 writing to inode 3407990 starting block 148237)
[ 1438.432242] sd 0:0:1:0: [sdb] tag#378 Unknown completion status: 0x1a
[ 1438.432266] sd 0:0:1:0: [sdb] tag#379 Unknown completion status: 0x1a
[ 1438.432287] sd 0:0:1:0: [sdb] tag#380 Unknown completion status: 0x1a
[ 1438.432300] sd 0:0:1:0: [sdb] tag#381 Unknown completion status: 0x1a

 

 

 

In the case of NVME, we see the following warnings in the ESXi logs, which are similar for the H730 perc with single SSD on it:

 

 

2022-04-06T09:14:41.418Z cpu48:2097431)WARNING: ScsiDeviceIO: 1498: Device t10.NVMe____KINGSTON_SNVS2000G______________________25DDF45268B72600 performance has deteriorated. I/O latency increased from average value of 11973 microseconds
2022-04-06T09:14:41.418Z cpu48:2097431)WARNING: to 2089427 microseconds.
2022-04-06T09:14:46.119Z cpu18:2099105)WARNING: NvmeScsi: 158: SCSI opcode 0x2a (0x45b91cfcae08) on path vmhba9:C0:T0:L0 to namespace t10.NVMe____KINGSTON_SNVS2000G______________________25DDF45268B72600 failed with NVMe host error status:
2022-04-06T09:14:46.119Z cpu18:2099105)WARNING: 0x8 translating to SCSI error H:0x2
2022-04-06T09:14:46.119Z cpu20:2097444)WARNING: HPP: HppThrottleLogForDevice:1136: Cmd 0x2a (0x45b91cf70408, 2112215) to dev "t10.NVMe____KINGSTON_SNVS2000G______________________25DDF45268B72600" on path "vmhba9:C0:T0:L0" Failed:
2022-04-06T09:14:46.119Z cpu20:2097444)WARNING: HPP: HppThrottleLogForDevice:1144: Error status H:0x2 D:0x0 P:0x0 . hppAction = 3

 

 

 

The same happens with debian 10 as VM. 

 

Do you have any tips or idea, where to look for the problem?

Labels (8)
0 Kudos
2 Replies
windwalker78
Contributor
Contributor

It looks like IOUMMU is the reason for the crashes. Without it in VM settings, the I/O tests succeed. 

However we need it in order to use more than 128 cores in the VM. 

Do you have any tip on BIOS or ESXi finetuning?

0 Kudos
windwalker78
Contributor
Contributor

Currently testing with "iommu=pt" and "amd_iommu=on". This seems to prevent the system from crashing.

0 Kudos