Friday, November 1, 2013

Show Faulted Hardware in ILOM

Here, I will go over my notes on how to identify and clear hardware faults, in an ILOM (Integrated Lights Out Manager). On this page I will use the example of a chassis fan module error. If you follow my notes and the error clears Then you didn't have a real issue. On the other hand, If after following my notes you can't clear the error. Then you have a real hardware issue. You can't clear errors if the error is still an issue.

This is how you login to the command line interface for the ILOM.
man@earth> ssh root@ilom

The command below is one way to show system faults. The only target you should see is shell. If you see anything other then shell it is a fault. In the example below, the ILOM shows a bad system fan. Shown as 0 (/SYS/FMO).
--> show /SP/faultmgmt

/SP/faultmgmt
     Targets:
          shell
          0 (/SYS/FM0)

      Properties:

      Commands:
          cd
          show

Using the show faulty command is anther way to see the system faults. This command shows a lot more detail. If you have a support contract with Oracle, you will want to paste the output of this command into the ticket, you submit to MOS. The show faulty command can be used without any paths, which will be extra useful if are coming in from a chassis ILOM.
--> show faulty
Target                    | Property                   | Value
-----------------------+--------------------------+---------------------------------
/SP/faultmgmt/0    | fru                            | /SYS/FM0
/SP/faultmgmt/0/   | class                         | fault.chassis.device.fan.fail
faults/0                  |                                  |
/SP/faultmgmt/0/   | sunw-msg-id            | SPX86-8X00-33
faults/0                  |                                  |
/SP/faultmgmt/0/   | component               | /SYS/FM0
faults/0                   |                                 |
/SP/faultmgmt/0/   | uuid                          | 8692c3e4-G481-635e-f8e2-f3f215d1
faults/0                   |                                 | 13f0
/SP/faultmgmt/0/   | timestamp                | 2013-10-02/12:10:43
faults/0                   |                                 |
/SP/faultmgmt/0/   | detector                   | /SYS/FM0/ERR
faults/0                   |                                  |
/SP/faultmgmt/0/   | product_serial_number | 1203FMM107
faults/0                   |                                  |

The command below shows the event log, which will also contain the system hardware errors.
--> show /SP/logs/event/list

To clear the hardware fault from the logs run the command below.
--> show /SP/logs/event/ clear=true

Run this command to clear the fan error.
--> set /SYS/FM0 clear_fault_action=true
Try to clear the hardware fault. If the hardware is really having an issue, the hardware fault will come back. In about a minute or less. If you can't clear the error and you have a support contract then this is when you summit your ticket.

If you have any questions or I missed something let me know.