Loading solaris sar data to MySQL
Sar (System Activity Report) is the performance data collector on all unix systems. If it works pretty well to collect data, displaying the result lets a bitter taste; forget any colorfull graphs, its output is only tabs of pure ascii text.
However, sar's weakness is also his strength; text is easy to parse and Unix have plenty of powerfull tools to do so.
My aim today is to parse the output of sar and to load it into MySQL. Why MySQL? Simply because it's a powerfull free database, and querying MySQL with Excel is easy.
The idea is simple:
Because I know field 1 contains
Then I loop thru all column names (remember, the number of columns is is
The second block is almost exactly the same, but I need it to process the output of
However, sar's weakness is also his strength; text is easy to parse and Unix have plenty of powerfull tools to do so.
My aim today is to parse the output of sar and to load it into MySQL. Why MySQL? Simply because it's a powerfull free database, and querying MySQL with Excel is easy.
The idea is simple:
sar > XML > LOAD XML command > MySQL database
Step 1 - choosing the output of sar
On Solaris, sar -A
provides all the results; actually, it is equivalent to successively issue these commands:
sar -u (CPU utilization) sar -d (activity for each block device) sar -q (average queue length while occupied) sar -b (buffer activities) sar -w (system swapping and switching activity) sar -c (system calls) sar -a (file access system routines) sar -y (TTY device activity) sar -v (status of process, i-node, file tables) sar -m (message and semaphore activities) sar -p (paging activities) sar -g (paging activities) sar -r (unused memory pages and disk blocks) sar -k (kernel memory allocation activities)
Step 2 - Output to XML
awk
is the most efficient tool to do so. Here is a simple awk program that turns the output of sar into XML.
sar -A|/usr/xpg4/bin/awk \ -v DT=`date '+%Y-%m-%d'` \ -v SV=`uname -n` \ '/^00:00:00/ {split($0,tag); \ tag[1]=NF+1; \ ID=NF+length($0)*length($2); \ a=1; \ continue} \ /^..:..:../ {\ if(a==1){TS=$1; \ s="<sardata."ID" server="SV" date="DT" time="TS" "; \ for(i=2;i<tag[1];i++){ \ s=s tag[i]"="$i" "}; \ s=s" />"; \ print s \ } \ } \ /^ */ { \ if(a==1){ \ s="<sardata."ID" server="SV" date="DT" time="TS" "; \ for(i=2;i<tag[1];i++){ \ s=s tag[i]"="$(i-1)" "}; \ s=s" />"; \ print s \ } \ } \ /^Average/ {a=0}'Ok, lets explain the code a little bit.
sar -A|/usr/xpg4/bin/awk
the output of sar -A is piped to awk. I use the xpg4 version (standard on Solaris) because I need to define variables. Nawk would probably do the same job.
-v DT=`date '+%Y-%m-%d'` -v SV=`uname -n`
I define 2 variables. DT contains today's date and SV contains the server name '/^00:00:00/ {split($0,tag);tag[1]=NF+1;ID=NF+length($0)*length($2);a=1;continue} \
The crunchy starts here. When I encounter '00:00:00'
, it indicates a new set of column names. split($0,tag)
put all these names into a table named 'tag'.Because I know field 1 contains
'00:00:00'
, which is useless, the command tag[1]=NF+1
replaces the value with the number of fields, because I will need it further.ID=NF+length($0)*length($2)
calculates a unique ID for each group of fields; it's a simple way to know which part of the output I'm currently working on.a=1
sets an indicator to allow further processing.continue
load the next line of data.
/^..:..:../ {if(a==1){TS=$1;s="<sardata."ID" server="SV" date="DT" time="TS" ";for(i=2;i<tag[1];i++){s=s tag[i]"="$i" "};s=s" />";print s}} \
On each line starting with a time indicator, I check if a is set (I'm not on the first row), I initialize the variable s
with some text.
Then I loop thru all column names (remember, the number of columns is is
tag[1]
) and I add the column name, the =
and the value found to the s
string. Once done, I finally add the closing XML tag and display the string. That's it!
The second block is almost exactly the same, but I need it to process the output of
sar -d
.
/^Average/ {a=0}
: when I encounter Average
, I stop processing.
Update: adapting the code to Linux systat
On Linux, the default output format for date is 12 hours; it must be converted to a 24 hours format before beeing transformed to XML
12:00:00 AM -> 00:00:00 01:00:00 AM -> 01:00:00 ... 12:00:00 PM -> 12:00:00 01:00:00 PM -> 13:00:00 ...The following code does the adaptation:
gawk -F'[ :]' '$1=="12" {$1="00"} \ $4=="PM" {$1=$1+12} \ $4=="AM"||$4=="PM" {$1=$1":"$2":"$3; $2=$3=$4=""} \ {$1=$1; print $0}'
Step 3 - LOAD XML
Then, a set of LOAD XML commands will process the XML output and load the tables.
LOAD XML INFILE '\\sardata.xml' INTO TABLE db_assets.sardata_u ROWS IDENTIFIED BY '<sardata.165>' SET retired=NULL;
sardata.165
is the unique ID I have calculated above. The correspondance between table name and ID is the following:
sardata_d ROWS IDENTIFIED BY sardata.428 sardata_q ROWS IDENTIFIED BY sardata.285 sardata_b ROWS IDENTIFIED BY sardata.513 sardata_w ROWS IDENTIFIED BY sardata.342 sardata_c ROWS IDENTIFIED BY sardata.456 sardata_a ROWS IDENTIFIED BY sardata.196 sardata_y ROWS IDENTIFIED BY sardata.399 sardata_v ROWS IDENTIFIED BY sardata.449 sardata_m ROWS IDENTIFIED BY sardata.123 sardata_p ROWS IDENTIFIED BY sardata.343 sardata_g ROWS IDENTIFIED BY sardata.377 sardata_r ROWS IDENTIFIED BY sardata.178 sardata_k ROWS IDENTIFIED BY sardata.499
Comments