Patch-ID# 113729-01 Keywords: sun storedge 3510 health check utility Synopsis: SE3510 s3khc.pl: SE3510 Health Check Utility Date: Jun/23/2004 Install Requirements: None Solaris Release: 8 9 SunOS Release: 5.8 5.9 Unbundled Product: StorEdge 3510 Array Controller Firmware Unbundled Release: 1.0 Xref: Topic: Relevant Architectures: sparc NOTE: This health check utility (to be run by Sun Service Personnel only) is to be run on SE3510 with controller(s) at revision -04 (370-5537-04) or below. This utility is not needed for SE3510s with controller(s) at revision -05 or higher. BugId's fixed with this patch: Changes incorporated in this version: Patches accumulated and obsoleted by this patch: Patches which conflict with this patch: Patches required with this patch: Obsoleted by: Files included with this patch: 113729-01.zip patchinfo README.113729-01 Problem Description: -------------------- A small percent of SE3510 fibre channel controllers have marginal components that can affect the reliability of the fibre channel disk loops. This results in intermittent CRC errors on the disk loops that may lead to a shutdown of the controller due to an identified firmware bug (5012640), and can lead to disk drives being marked as failed incorrectly. This health check utility, written in Perl language, will monitor the health of the SE3510 array controller(s) and attached expansion chassis. The utility provides the following functionalities: 1. Poll the Link Error Status Block (LESB) counters in attempt to identify possible excessive Fibre Channel signal jitter. The Health Check periodically polls the LESB counters of all attached disk drives and RAID controller drive channels in an attempt to detect excessive jitter or other fibre channel problems. The LESB counters include loss of sync, LIP, loss of signal, primitive error, invalid transmission word, and invalid CRC. The counters are polled every 50 minutes and then compared to a predefined threshold value within predefined time period. If the threshold value is exceeded, the utility will log an alert message to replace the controller(s) in the /var/adm/messages and the /var/log/s3khc.log files. 2. Identify possible defective JBOD jumper cables. If the utility detects the JBOD manufacturing date from the chassis FRU-ID to be before 12/31/2003, it will log an alert message in the /var/adm/messages and the /var/log/s3khc.log files of possible down rev/defective JBOD jumper cables (labeled TCC) that were shipped with the unit. Newer JBOD cables (labeled BIZ) should be used. 3. Record RAID Controller event log. The health check utility periodically polls the RAID controller event log for new events and records them to the /var/log/s3khc.log file, if using the default file name. The event log also is polled for a reset of the RAID Controller. In the event of a RAID Controller reset, LESB counters may reach excessively high levels. These increases in the LESB counters are normal during the RAID controller reset process. In the event of a RAID Controller reset, the health check discards all current LESB error counts to avoid false triggers. 4. Log utility progress, warning and event messages, and LESB counters to a log file. By default, all progress, warning and event messages, and LESB counters are logged to the /var/log/s3khc.log file. This file is intended for engineering use only. Simplified messages are also logged in the /var/adm/messages file. The field is suggested to look at the /var/adm/messages file to take any actions necessary. Note, the health check utility achieves all of the above functionalities with very minimal impact to system performance and resource. Patch Installation Instructions: -------------------------------- 1. unzip 113729-01.zip 2. cd to 113729-01 3. unzip SUNWs3khc-1.0.0.zip 4. pkgadd -d . SUNWs3khc After Patch Installation: ------------------------ ***Only Sun Service Personnel should run the health check utility*** By default, the utility will install in the /opt/SUNWs3khc directory, but it can be launched from anywhere at the command line with the following options: s3khc.pl --device= --password= --output= --runtime=<150 min default> Options: --device=IP address It is recommended that you run out-of-band by specifying an IP address. One can run inband by specifying "/dev/rdsk/c#t#d#s2", but running inband can take a long time if there is a high load of host I/O being generated as utility commands are given lower priority. Only 1 3510 IP address can be specified per each utility execution. --password=password Password for Out-of-Band interface only. This option is needed if running Out-of-Band and the RAID controller password is set. --runtime=minutes This is length of time (minutes) to run the utility. Default is 150 min (2.5hr) if not specified. The system clock is used in the calculations. Changes to the system clock will effect this run time. A runtime of "0" will cause the utility to run indefinitely. --output=/directory/filename Send output to a file. By default, the utility will log a summary of the utility execution and what actions to take in the /var/adm/messages file. The field is expected to look into the /var/adm/messages file to determine what to do after running the utility. The utility will log more verbose and detail messages, intended for engineering use, in the /var/log/s3khc.log file. The "output" option used here is to change the location and name of the verbose log file. It is recommended that, for each different 3510 that the utility is run against, a different log file name is assigned to each 3510. If a filename of "-" is given, output will be written to standard output, generally the terminal from which the utility was launched. The process will also run in the foreground. Utility and sccli errors are directed to Standard error. If deciding to use default "output" /var/log/s3khc.log file name, it will be appended to each time the utility is invoked, and thus the file can become large. INTERPRETING UTILITY RESULTS ---------------------------- There are only 2 "alert" messages below that the field needs to pay attention to. They are logged in both /var/adm/messages and /var/log/. For simplicity, the field only needs to look at the /var/adm/messages, as the the /var/log/ file is more for engineering use. a) Jun 1 13:25:36 hba2-82 s3khc[1643]: 002294: Alert: Serial: 00223D, Check the Sun StorEdge 3510 expansion FC Jumper Cables If you see the above message, consider checking if you have older JBOD FC cables (-02) that are labeled TCC. Refer to the url below to identify TCC cables. http://webhome.sfbay/minnow_eng/TCC_cable.gif If found to have older cables, you should replace them with newer JBOD FC cables (-03) that are labeled BIZ. Follow the section "HOW TO REPLACE CABLE(S)" below. Note that a 2nd customer visit is needed after ordering and receiving the necessary cables. If you are currently using the newer JBOD FC cables (-03), then you can disregard this message. Or if you have -05 or higher in your 3510, you don't need to run this utility to inform you to check the JBOD FC cables. Instead, you can do a visual inspection of the JBOD FC cables label, as noted in the url above, to tell you if you have to replace them or not. b) May 13 13:27:41 vcs1 s3khc[2832]: 004504: Alert: LESB counts increasing, Consider replacing the Sun StorEdge 3510 RAID controller The above message indicates that the 3510 controller(s) can affect FC loop reliability. The utility can't differentiate which controller (in dual controller config) is having the problem and thus consider replacing all controllers that are at rev -04 or below with rev -06. Follow the section "HOW TO REPLACE CONTROLLER(S)" below. Note that a 2nd customer visit is needed after ordering and receiving the necessary controllers. Refer to the url below to identify controller revision. http://webhome.sfbay/minnow_eng/3510_ctr_label.gif Do not replace any -05 or higher controllers. This utility is only intended to monitor -04 or lower controllers. If you have a dual controller system in which 1 controller is -05 or higher and the other is -04 or lower, the utility will monitore the -04 or lower controller, but not the -05 or higher. Special Install Instructions: ----------------------------- Note that even though this is bundled as a patch, you will not use patchadd to install it, but rather pkgadd. README -- Last modified date: Wednesday, June 23, 2004