Loading solaris sar data to MySQL

Sar (System Activity Report) is the performance data collector on all unix systems. If it works pretty well to collect data, displaying the result lets a bitter taste; forget any colorfull graphs, its output is only tabs of pure ascii text.
However, sar's weakness is also his strength; text is easy to parse and Unix have plenty of powerfull tools to do so.
My aim today is to parse the output of sar and to load it into MySQL. Why MySQL? Simply because it's a powerfull free database, and querying MySQL with Excel is easy.
The idea is simple:
sar > XML > LOAD XML command > MySQL database

Step 1 - choosing the output of sar

On Solaris, sar -A provides all the results; actually, it is equivalent to successively issue these commands:

sar -u (CPU utilization)
sar -d (activity for each block  device)
sar -q (average queue length while occupied)
sar -b (buffer activities)
sar -w (system swapping and switching activity)
sar -c (system calls)
sar -a (file access system routines)
sar -y (TTY device activity)
sar -v (status of process, i-node, file tables)
sar -m (message and semaphore activities)
sar -p (paging activities)
sar -g (paging activities)
sar -r (unused memory pages and disk blocks)
sar -k (kernel memory allocation activities)

Step 2 - Output to XML

awk is the most efficient tool to do so. Here is a simple awk program that turns the output of sar into XML.

sar -A|/usr/xpg4/bin/awk \
-v DT=`date '+%Y-%m-%d'` \
-v SV=`uname -n` \
'/^00:00:00/ {split($0,tag); \
    tag[1]=NF+1; \
    ID=NF+length($0)*length($2); \
    a=1; \
    continue} \
/^..:..:../ {\
    if(a==1){TS=$1; \
        s="<sardata."ID" server="SV" date="DT" time="TS" "; \
        for(i=2;i<tag[1];i++){ \
            s=s tag[i]"="$i" "}; \
        s=s" />"; \
        print s \
        } \
    } \
/^  */ { \
    if(a==1){ \
        s="<sardata."ID" server="SV" date="DT" time="TS" "; \
        for(i=2;i<tag[1];i++){ \
            s=s tag[i]"="$(i-1)" "}; \
        s=s" />"; \
        print s \
        } \
    } \
/^Average/ {a=0}'
Ok, lets explain the code a little bit.
sar -A|/usr/xpg4/bin/awk the output of sar -A is piped to awk. I use the xpg4 version (standard on Solaris) because I need to define variables. Nawk would probably do the same job.
-v DT=`date '+%Y-%m-%d'` -v SV=`uname -n` I define 2 variables. DT contains today's date and SV contains the server name
'/^00:00:00/ {split($0,tag);tag[1]=NF+1;ID=NF+length($0)*length($2);a=1;continue} \ The crunchy starts here. When I encounter '00:00:00', it indicates a new set of column names. split($0,tag) put all these names into a table named 'tag'.
Because I know field 1 contains '00:00:00', which is useless, the command tag[1]=NF+1 replaces the value with the number of fields, because I will need it further.
ID=NF+length($0)*length($2) calculates a unique ID for each group of fields; it's a simple way to know which part of the output I'm currently working on.
a=1 sets an indicator to allow further processing.
continue load the next line of data.

/^..:..:../ {if(a==1){TS=$1;s="<sardata."ID" server="SV" date="DT" time="TS" ";for(i=2;i<tag[1];i++){s=s tag[i]"="$i" "};s=s" />";print s}} \ On each line starting with a time indicator, I check if a is set (I'm not on the first row), I initialize the variable s with some text.
Then I loop thru all column names (remember, the number of columns is is tag[1]) and I add the column name, the = and the value found to the s string. Once done, I finally add the closing XML tag and display the string. That's it!
The second block is almost exactly the same, but I need it to process the output of sar -d.
/^Average/ {a=0}: when I encounter Average, I stop processing.

Update: adapting the code to Linux systat

On Linux, the default output format for date is 12 hours; it must be converted to a 24 hours format before beeing transformed to XML

12:00:00 AM  -> 00:00:00
01:00:00 AM  -> 01:00:00
12:00:00 PM  -> 12:00:00
01:00:00 PM  -> 13:00:00
The following code does the adaptation:
gawk -F'[ :]' '$1=="12" {$1="00"} \
$4=="PM" {$1=$1+12} \
$4=="AM"||$4=="PM" {$1=$1":"$2":"$3; $2=$3=$4=""} \
{$1=$1; print $0}'

Step 3 - LOAD XML

Then, a set of LOAD XML commands will process the XML output and load the tables.

LOAD XML INFILE '\\sardata.xml' 
     INTO TABLE db_assets.sardata_u 
     ROWS IDENTIFIED BY '<sardata.165>'
     SET retired=NULL;
sardata.165 is the unique ID I have calculated above. The correspondance between table name and ID is the following:
sardata_d ROWS IDENTIFIED BY sardata.428
sardata_q ROWS IDENTIFIED BY sardata.285
sardata_b ROWS IDENTIFIED BY sardata.513
sardata_w ROWS IDENTIFIED BY sardata.342
sardata_c ROWS IDENTIFIED BY sardata.456
sardata_a ROWS IDENTIFIED BY sardata.196
sardata_y ROWS IDENTIFIED BY sardata.399
sardata_v ROWS IDENTIFIED BY sardata.449
sardata_m ROWS IDENTIFIED BY sardata.123
sardata_p ROWS IDENTIFIED BY sardata.343
sardata_g ROWS IDENTIFIED BY sardata.377
sardata_r ROWS IDENTIFIED BY sardata.178
sardata_k ROWS IDENTIFIED BY sardata.499


Popular Posts