Fault Management in Solaris

I recently read an entry on leal’s blog about using fault management to check for problems on your system. I created a script that just ssh into each host and check’s to see if fltlog (fault log) is empty or not. Here’s a little snippet:

for h in ${HOSTLIST};do
TEST=`$SSH $h "/usr/sbin/fmdump | /usr/bin/grep -v TIME \
| usr/bin/head -n 1| /usr/bin/cut -d' ' -f 1"`;

# check if the result of test is null string.
# If it is then there was something outputted
# by fmdump on the client machine
if [ $TEST ]
echo $h >> ${FMACHECKMAIL}

I’ll basically get an e-mail that says which hosts to check the logs on (using fmdump). When I fixed the problem, usually I’ll just rotate the log to make it empty again:
fmadm rotate fltlog

