Skip to main content

Loading solaris sar data to MySQL

Sar (System Activity Report) is the performance data collector on all unix systems. If it works pretty well to collect data, displaying the result lets a bitter taste; forget any colorfull graphs, its output is only tabs of pure ascii text.
However, sar's weakness is also his strength; text is easy to parse and Unix have plenty of powerfull tools to do so.
My aim today is to parse the output of sar and to load it into MySQL. Why MySQL? Simply because it's a powerfull free database, and querying MySQL with Excel is easy.
The idea is simple:
sar > XML > LOAD XML command > MySQL database

Step 1 - choosing the output of sar

On Solaris, sar -A provides all the results; actually, it is equivalent to successively issue these commands:

sar -u (CPU utilization)
sar -d (activity for each block  device)
sar -q (average queue length while occupied)
sar -b (buffer activities)
sar -w (system swapping and switching activity)
sar -c (system calls)
sar -a (file access system routines)
sar -y (TTY device activity)
sar -v (status of process, i-node, file tables)
sar -m (message and semaphore activities)
sar -p (paging activities)
sar -g (paging activities)
sar -r (unused memory pages and disk blocks)
sar -k (kernel memory allocation activities)

Step 2 - Output to XML

awk is the most efficient tool to do so. Here is a simple awk program that turns the output of sar into XML.

sar -A|/usr/xpg4/bin/awk \
-v DT=`date '+%Y-%m-%d'` \
-v SV=`uname -n` \
'/^00:00:00/ {split($0,tag); \
    tag[1]=NF+1; \
    ID=NF+length($0)*length($2); \
    a=1; \
    continue} \
/^..:..:../ {\
    if(a==1){TS=$1; \
        s="<sardata."ID" server="SV" date="DT" time="TS" "; \
        for(i=2;i<tag[1];i++){ \
            s=s tag[i]"="$i" "}; \
        s=s" />"; \
        print s \
        } \
    } \
/^  */ { \
    if(a==1){ \
        s="<sardata."ID" server="SV" date="DT" time="TS" "; \
        for(i=2;i<tag[1];i++){ \
            s=s tag[i]"="$(i-1)" "}; \
        s=s" />"; \
        print s \
        } \
    } \
/^Average/ {a=0}'
Ok, lets explain the code a little bit.
sar -A|/usr/xpg4/bin/awk the output of sar -A is piped to awk. I use the xpg4 version (standard on Solaris) because I need to define variables. Nawk would probably do the same job.
-v DT=`date '+%Y-%m-%d'` -v SV=`uname -n` I define 2 variables. DT contains today's date and SV contains the server name
'/^00:00:00/ {split($0,tag);tag[1]=NF+1;ID=NF+length($0)*length($2);a=1;continue} \ The crunchy starts here. When I encounter '00:00:00', it indicates a new set of column names. split($0,tag) put all these names into a table named 'tag'.
Because I know field 1 contains '00:00:00', which is useless, the command tag[1]=NF+1 replaces the value with the number of fields, because I will need it further.
ID=NF+length($0)*length($2) calculates a unique ID for each group of fields; it's a simple way to know which part of the output I'm currently working on.
a=1 sets an indicator to allow further processing.
continue load the next line of data.

/^..:..:../ {if(a==1){TS=$1;s="<sardata."ID" server="SV" date="DT" time="TS" ";for(i=2;i<tag[1];i++){s=s tag[i]"="$i" "};s=s" />";print s}} \ On each line starting with a time indicator, I check if a is set (I'm not on the first row), I initialize the variable s with some text.
Then I loop thru all column names (remember, the number of columns is is tag[1]) and I add the column name, the = and the value found to the s string. Once done, I finally add the closing XML tag and display the string. That's it!
The second block is almost exactly the same, but I need it to process the output of sar -d.
/^Average/ {a=0}: when I encounter Average, I stop processing.

Update: adapting the code to Linux systat

On Linux, the default output format for date is 12 hours; it must be converted to a 24 hours format before beeing transformed to XML

12:00:00 AM  -> 00:00:00
01:00:00 AM  -> 01:00:00
...
12:00:00 PM  -> 12:00:00
01:00:00 PM  -> 13:00:00
...
The following code does the adaptation:
gawk -F'[ :]' '$1=="12" {$1="00"} \
$4=="PM" {$1=$1+12} \
$4=="AM"||$4=="PM" {$1=$1":"$2":"$3; $2=$3=$4=""} \
{$1=$1; print $0}'

Step 3 - LOAD XML

Then, a set of LOAD XML commands will process the XML output and load the tables.

LOAD XML INFILE '\\sardata.xml' 
     INTO TABLE db_assets.sardata_u 
     ROWS IDENTIFIED BY '<sardata.165>'
     SET retired=NULL;
sardata.165 is the unique ID I have calculated above. The correspondance between table name and ID is the following:
sardata_d ROWS IDENTIFIED BY sardata.428
sardata_q ROWS IDENTIFIED BY sardata.285
sardata_b ROWS IDENTIFIED BY sardata.513
sardata_w ROWS IDENTIFIED BY sardata.342
sardata_c ROWS IDENTIFIED BY sardata.456
sardata_a ROWS IDENTIFIED BY sardata.196
sardata_y ROWS IDENTIFIED BY sardata.399
sardata_v ROWS IDENTIFIED BY sardata.449
sardata_m ROWS IDENTIFIED BY sardata.123
sardata_p ROWS IDENTIFIED BY sardata.343
sardata_g ROWS IDENTIFIED BY sardata.377
sardata_r ROWS IDENTIFIED BY sardata.178
sardata_k ROWS IDENTIFIED BY sardata.499

Comments

Popular posts from this blog

Drive replacement for Fostex DMT8-vl

The IDE hard drive on my Fostex DMT8-vl multitrack recorder shows signs of its imminent death; when getting hot, I could not record anymore. Must be said this drive comes from an old Sun Station, and has been replaced because I/O failures were detected by Solaris. It worked at least 5 years in my recorder: not so bad. However, time is now to replace it. The DMT8-vl is not able to handle drives bigger than 8.4 GB. Well, it is able to (the current drive is 15 GB), but only 8.4 GB will be usable. My tought was to use a 8 GB CompactFlash; having no moving parts means no noise, which is quite temptating for a music recording device. I purchased a CompactFlash-IDE adapter on the internet (8$) and I had to build a male-male IDE cable adapter (4$). Unfortunately, this doesn't work. The drive is correctly discovered by the operating system, which proposes to format it ("format IDE?"). After answering "yes", the formating runs pretty fast (faster than on a real drive), ...

Samba: Clients get "system error 1223" (or 123) after a server reboot

Facts: a Linux+Samba server shares anonymously a folder. After a reboot, Win clients could not attach the share drive anymore. C:\>net use \\mylinux\folder Enter the user name for 'mylinux': System error 1223 has occurred. The operation was canceled by the user. C:\>net view \\mylinux\ System error 123 has occurred. The filename, directory name, or volume label syntax is incorrect. The process are present, and tcpdump doesn't provide much information. What's going on? After hours of headscratching, the light came: the firewall was on and no rules for the Samba protocol! Grrr!

Issue with Soundpool MO4

I have a Atari STe with a Soundpool MO4 MIDI extension. It used to work very well, but unfortunatelly doesn't anymore: Cubase still detects it, and I can output MIDI to it but nothing is coming out from any MIDI Out. It took me a while to tackle it (lack of time, lack of tool, other items to play with), but I gave a glance last week-end. The parallel port on the Atari uses only the following signals: Pin 1 : Strobe (Atari -> MO4) Pin 2 : Data 0 (Atari -> MO4) Pin 3 : Data 1 (Atari -> MO4) Pin 4 : Data 2 (Atari -> MO4) Pin 5 : Data 3 (Atari -> MO4) Pin 6 : Data 4 (Atari -> MO4) Pin 7 : Data 5 (Atari -> MO4) Pin 8 : Data 6 (Atari -> MO4) Pin 9 : Data 7 (Atari -> MO4) Pin 11: Busy (MO4 -> Atari) The MO4 also decodes few other pins, but since the Atari doesn't, my guess is the MO4 was also targeted for PC. Inside the box, the MO4 is architectured around a CPLD (IspLSI1016 from Lattice) which contains the logi...