VMware Cloud Community
DavieDubai
Contributor
Contributor

x9scl+-F 3xM1015 (sas2008-IT mode) major CHKSum Errors on scrubs - only in ESXi

I've got a strange problem with Openindiana (versions 151a5 up to 151a7) using VT-d where any heavy load on my 10xSegate 3TB drives (or any drives for that matter) cause thousands of CHKSum Errors under ESXi 5 and 5.1.

The drives are in a Norco 4224 and when the problem manifests itself I see a drive (or two or three) activity LEDs light up and stay lit.
If I boot the Openindiana Live CD and run a scrub everything checks out fine and there are no errors.
I have updated the X9SCL BIOS to 2.0b . I'm also using the latest IT mode firmware on the IBM 1015 HBAs.

doing a copy from one pool to another gives the following in dmesg:

Oct 15 21:57:55 openindiana scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@18/pci1000,3020@0 (mpt_sas13):
Oct 15 21:57:55 openindiana mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110d00
Oct 15 21:57:55 openindiana scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@18/pci1000,3020@0 (mpt_sas13):
Oct 15 21:57:55 openindiana mptsas_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31110d00
Oct 15 21:57:55 openindiana scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@16/pci1000,3020@0 (mpt_sas1):
Oct 15 21:57:55 openindiana mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110d00
Oct 15 21:57:55 openindiana scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@16/pci1000,3020@0 (mpt_sas1):
Oct 15 21:57:55 openindiana mptsas_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31110d00
Oct 15 21:57:59 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@18/pci1000,3020@0 (mpt_sas13):
Oct 15 21:57:59 openindiana Log info 0x31110d00 received for target 13.
Oct 15 21:57:59 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Oct 15 21:57:59 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@18/pci1000,3020@0 (mpt_sas13):
Oct 15 21:57:59 openindiana Log info 0x31110d00 received for target 13.
Oct 15 21:57:59 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Oct 15 21:57:59 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@18/pci1000,3020@0 (mpt_sas13):
Oct 15 21:57:59 openindiana Log info 0x31110d00 received for target 13.
Oct 15 21:57:59 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Oct 15 21:57:59 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@18/pci1000,3020@0 (mpt_sas13):
Oct 15 21:57:59 openindiana Log info 0x31110d00 received for target 13.
Oct 15 21:57:59 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Oct 15 21:57:59 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@18/pci1000,3020@0 (mpt_sas13):
Oct 15 21:57:59 openindiana Log info 0x31110d00 received for target 13.
Oct 15 21:57:59 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Oct 15 21:57:59 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@18/pci1000,3020@0 (mpt_sas13):
Oct 15 21:57:59 openindiana Log info 0x31110d00 received for target 13.
Oct 15 21:57:59 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Oct 15 21:57:59 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@18/pci1000,3020@0 (mpt_sas13):
Oct 15 21:57:59 openindiana Log info 0x31110d00 received for target 13.
Oct 15 21:57:59 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Oct 15 21:57:59 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@18/pci1000,3020@0 (mpt_sas13):
Oct 15 21:57:59 openindiana Log info 0x31110d00 received for target 13.
Oct 15 21:57:59 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Oct 15 21:57:59 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@18/pci1000,3020@0 (mpt_sas13):
Oct 15 21:57:59 openindiana Log info 0x31110d00 received for target 13.
Oct 15 21:57:59 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Oct 15 21:57:59 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@18/pci1000,3020@0 (mpt_sas13):
Oct 15 21:57:59 openindiana Log info 0x31110d00 received for target 13.
Oct 15 21:57:59 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc

a quick zpool status (before a scrub) gives

zpool status repository
pool: repository
state: ONLINE
status: The pool is formatted using a legacy on-disk format. The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
pool will no longer be accessible on software that does not support feature
flags.
scan: none requested
config:


NAME STATE READ WRITE CKSUM
repository ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
c8t5000C5004DFD988Ed0 ONLINE 0 0 0
c8t5000C5004E369D82d0 ONLINE 0 0 0
c8t5000C5004E370D67d0 ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
c8t5000C5004E4258A0d0 ONLINE 0 0 0
c8t5000C5004E4472E4d0 ONLINE 0 0 0
c8t5000C5004E449528d0 ONLINE 0 0 0
raidz1-2 ONLINE 0 0 0
c8t5000C5004E449966d0 ONLINE 0 0 0
c8t5000C5004E4499A7d0 ONLINE 0 0 0
c8t5000C5004E449A98d0 ONLINE 0 0 0


errors: No known data errors

then a scrub gives me the following after a few seconds:


zpool status repository
pool: repository
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: scrub in progress since Mon Oct 15 22:03:21 2012
222M scanned out of 5.93G at 8.21M/s, 0h11m to go
8.26M repaired, 3.65% done
config:


NAME STATE READ WRITE CKSUM
repository DEGRADED 0 0 0
raidz1-0 ONLINE 0 0 0
c8t5000C5004DFD988Ed0 ONLINE 0 0 0
c8t5000C5004E369D82d0 ONLINE 0 0 0
c8t5000C5004E370D67d0 ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
c8t5000C5004E4258A0d0 ONLINE 0 0 0
c8t5000C5004E4472E4d0 ONLINE 0 0 0
c8t5000C5004E449528d0 ONLINE 0 0 0
raidz1-2 DEGRADED 0 0 0
c8t5000C5004E449966d0 ONLINE 0 0 0
c8t5000C5004E4499A7d0 ONLINE 0 0 0
c8t5000C5004E449A98d0 DEGRADED 0 0 135 too many errors (repairing)


and more issues in dmesg:


Oct 15 22:04:52 openindiana vmxnet3s: [ID 654879 kern.notice] vmxnet3s:0: getcapab(0x200000) -> no
Oct 15 22:04:52 openindiana last message repeated 1 time
Oct 15 22:04:52 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@18/pci1000,3020@0 (mpt_sas13):
Oct 15 22:04:52 openindiana Log info 0x31080000 received for target 13.
Oct 15 22:04:52 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0x0
Oct 15 22:04:55 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@18/pci1000,3020@0 (mpt_sas13):
Oct 15 22:04:55 openindiana Log info 0x31080000 received for target 13.
Oct 15 22:04:55 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0x0
Oct 15 22:04:59 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@16/pci1000,3020@0 (mpt_sas1):
Oct 15 22:04:59 openindiana Log info 0x31080000 received for target 14.
Oct 15 22:04:59 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0x0
Oct 15 22:04:59 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@16/pci1000,3020@0 (mpt_sas1):
Oct 15 22:04:59 openindiana Log info 0x31080000 received for target 14.
Oct 15 22:04:59 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0x0
Oct 15 22:04:59 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@16/pci1000,3020@0 (mpt_sas1):
Oct 15 22:04:59 openindiana Log info 0x31080000 received for target 14.
Oct 15 22:04:59 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0x0
Oct 15 22:05:01 openindiana scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@16/pci1000,3020@0 (mpt_sas1):
Oct 15 22:05:01 openindiana Log info 0x31080000 received for target 14.
Oct 15 22:05:01 openindiana scsi_status=0x0, ioc_status=0x804b, scsi_state=0x0
Oct 15 22:05:01 openindiana scsi_vhci: [ID 734749 kern.warning] WARNING: vhci_scsi_reset 0x1

and when the scrub finishes I'm left with:


zpool status repository
pool: repository
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: scrub repaired 575M in 0h7m with 0 errors on Mon Oct 15 22:10:58 2012
config:


NAME STATE READ WRITE CKSUM
repository DEGRADED 0 0 0
raidz1-0 ONLINE 0 0 0
c8t5000C5004DFD988Ed0 ONLINE 0 0 0
c8t5000C5004E369D82d0 ONLINE 0 0 0
c8t5000C5004E370D67d0 ONLINE 0 0 0
raidz1-1 DEGRADED 0 0 0
c8t5000C5004E4258A0d0 ONLINE 0 0 0
c8t5000C5004E4472E4d0 ONLINE 0 0 0
c8t5000C5004E449528d0 DEGRADED 0 0 4.86K too many errors
raidz1-2 DEGRADED 0 0 0
c8t5000C5004E449966d0 DEGRADED 0 0 43 too many errors
c8t5000C5004E4499A7d0 ONLINE 0 0 0
c8t5000C5004E449A98d0 DEGRADED 0 0 4.15K too many errors


errors: No known data errors

If I boot OI Live CD natively and import the pool everything is fine after a scrub.

0 Kudos
1 Reply
DavieDubai
Contributor
Contributor

I guess I'll skip the ESX part and run this natively

0 Kudos