6 Replies Latest reply on Sep 9, 2011 7:19 PM by dtasj

    SCSI-timeout bug in ESX4 VMware Tools for kernels below 2.6.13 (e.g. RHEL4)

    MKguy Virtuoso

      So I recently filed a service request with the VMware support because of the following issue:

       

      The SCSI timeout value in /sys/block/sdX/device/timeout is not increased through installing the latest ESX4 VMware tools on RHEL4 VMs. It remains at the default 30 seconds, while the value on RHEL5/CentOS5 VMs is being increased from default 60 to 180 seconds by installing the VMware tools.

       

      No big deal you might say, but that cost us some unnecessary downtime of RHEL4 (and only RHEL4) VMs when we had problems with the SAN, while all other RHEL5 and Windows VMs ran just fine. Of course we can easily set this value manually, but isn't that (at least my expectation) the job of the VMware tools, just like they do fine on Windows and RHEL5 VMs?

       

      Whatever; As it turned out during various tests and the exchange of emails with the VMware support, this is due to a missing udev rule (99-vmware-scsi-udev.rules) in /etc/udev/rules.d/ on these RHEL4 VMs after installing the tools.

       

      99-vmware-scsi-udev.rules
      
      #
      # VMware SCSI devices Timeout adjustment
      #
      # Modify the timeout value for VMware SCSI devices so that
      # in the event of a failover, we don't time out.
      # See Bug 271286 for more information.
      #
      # Note: The Udev systems vary from distro to distro.  Hence all of the
      #       extra entries.
      
      # Redhat systems
      ACTION=="add", BUS=="scsi", SYSFS{vendor}=="VMware, " , SYSFS{model}=="VMware Virtual S",   RUN+="/bin/sh -c 'echo 180 >/sys$DEVPATH/device/timeout'"
      

       

      The reason why this is missing on all RHEL4 VMs is, according to the support, that a Linux Kernel of at least 2.6.13 (which apparently contained udev related updates) is required for this to work. However, all RHEL4 releases as well RedHat Kernel updates for RHEL4, are based on the 2.6.9 kernel.

       

      The support stated that they have no intention to fix this (like automatically setting the SCSI timeout with some other way) or to release a KB article to point out this potential problem to the public.

       

      Yes, as stated above, this might not be that big of a deal and I know I can easily set the timeout values myself, but what bothered me a bit was the reluctance of the support to actually fix this someday or at least plan for a KB article with an official recommendation to set this value manually on certain systems.

       

       

      I mean, I can't be the first one to stumble upon this issue, can I? Has anyone else seen this and accordingly contacted VMware support?