ESX Server 3 vmkernel.log Lists

Version 1

    Introduction

     

    This document provides the red (critical - immediate action), orange (warning - track this alert) and black (ignore but trend on this message) for the vmkernel.log of ESX Server 3.

     

    Intended Audience

     

    VMware Certified Professionals (VCPs) and systems management professionals when implementing log management for VI3.

     

    Outline

     

    1. Introduction to vmkernel.log

    2. vmkernel.log Red List

    3. vmkernel.log Orange List

    4. vmkernel.log Black List

     

    Introduction to vmkernel.log

     

    All the hypervisor/vmkernel messages are posted in the Console OS log:

     

    /var/log/vmkernel.log

     

    Good ways to access this log:

     

    1. Centralize it (doc TBD) via syslog, and us a tool like Splunk!

    2. Use less|more to view it live

    3. tail -f /var/log/vmkernel.log

     

    vmkernel.log Red List

     

    Messages on the Red List should be picked up by the log monitoring automation and alerted as a high priority outage for immediate investigation.

     

    1. KB 1003615 - host attached to share storage failures

     

    Message

    Dec  8 20:06:01 esx013 vmkernel: 29:16:44:10.325 cpu3:1032)SCSI: 3753: AsyncIO timeout (5000); aborting cmd w/ sn 941463, handle 1472/0x40211a28
    Dec  8 20:06:01 esx013 vmkernel: 29:16:44:10.325 cpu3:1032)LinSCSI: 3616: Aborting cmds with world 1024, originHandle 0x40211a28, originSN 941463 from vmhba0:0:5
    Dec  8 20:06:01 esx013 vmkernel: 29:16:44:10.325 cpu3:1032)<6>qla24xx_abort_command(0): handle to abort=857
    Dec  8 20:06:01 esx013 vmkernel: 29:16:44:10.326 cpu3:1032)LinSCSI: 2604: Forcing host status from 2 to SCSI_HOST_OK
    Dec  8 20:06:01 esx013 vmkernel: 29:16:44:10.326 cpu3:1032)LinSCSI: 2606: Forcing device status from SDSTAT_GOOD to SDSTAT_BUSY
    Dec  8 20:06:01 esx013 vmkernel: 29:16:44:10.326 cpu3:1032)SCSI: 3753: AsyncIO timeout (5000); aborting cmd w/ sn 1073299, handle 2415/0x4020c038
    

     

    Impact

    Storage event that has caused an outage to every host accessing the shared storage.

     

    Action

    Review the storage controller or processor logs on the array for any events or messages occurring around the date the instance occurred. There are number of reasons why this kind of outage occurs like, controller problem, failing hard drive, SAN Copy operation being initiated. Also review the switch logs for the same time frame to see if the switches played a factor in this outage.

     

    vmkernel.log Orange List

     

    Messages on the Orange List should be picked up by the log monitoring automation and alerted as a warning for review.

     

    vmkernel.log Black List

     

    Messages on the Black List should be collected by the log monitoring automation and trended for future analysis.

     

    Resources

     

    Authors