How to compare multiple files?
The context
For a project, we had to create over 100 virtual machines (VM), each one with specific caracteristics: OS, # CPU, RAM, # disks, disk space... About 1/3 VM are Windows, 2/3 Linux.
I maintain a MS Access Database with the requested VM.
The project is near ending; for quality control, I need to validate what is delivered vs. what was requested. My sysadmins went on almost(*) all the VM to get the caracteristics; the result is 2 text files, one for Windows and one for Linux.
*: This is important. I don't have all the VM yet, hence I will have to do this process again; this have to be taken into account in my solution.
How to compare the data?
Method 1: by hand
Boring, tedious and highly error prone. Forgotten.
Method 2: using Excel
I tried that first. But I'm not a king on Excel, and because the caracteristics are quite different between Linux and Windows, is't hard to make comparisons. Abandoned.
Method 3: using Access
This is actually the most logical way, because my initial source of information is already in Access. Moreover, I do SQL for many years, so I feel comfortable to compare the servers with it (despite Access's SQL is somehow different from standard).
However, because the caracteristics are different between each server, a import from the text file did not yield good results.
But wait! Access is able to manage XML... let's try that.
What I've done?
- I've merged the 2 text files
- I've transform the raw text file into XML with a basic text editor; tedious, and I made a lot of errors, but with the help of online XML syntax checker, I've ended with a clean file.
- Import the XML to Access: encountered a first problem
My XML file was similar to this:
<servers> <server kernel="..." name="..." os="..." ..> <filesystems> <fs mount="/dev" name="devtmpfs" size="5.8G" ... > ... </filesystems> </server> ... </servers>
However, using attributes was a bad idea: Access doesn't seem able to manage them and I ended with empty fields in the result table:
To overcome this, I turned the attributes into elements (easy, using the Find & Replace function of Notepad):
<servers> <server> <name>...</name> <os>...</os> <ci>...</ci> ... <filesystems> <fs> <name>devtmpfs</name> <size>12G</size> <mount>/dev</mount> ... </fs> .... </filesystems> </server> .... <servers>Problem #2: no common key between the generated tables
Access ended with 2 tables: server and filesystems. However, no primary key is defined in server, but worse no foreign key is defined in filesystems!
The solution: transforming the XML during the import. Access allows to specify a XSLT file to transform the XML.
XML source --> XSLT transformation --> Access tables
Here is my XSLT:
<xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output indent="yes" omit-xml-declaration="yes" /> <xsl:strip-space elements="*" /> <xsl:template match="node()"> <xsl:copy> <xsl:apply-templates select="node()"> </xsl:copy> </xsl:template> <xsl:template match="fs"> <fs> <ci><xsl:apply-templates select="../ci"></ci> </fs> </xsl:template> </xsl:stylesheet >
Update: gathering data with Ansible
I now have access to an Ansible console; this will allow me to gather the data on a regular basis. Sadly, Ansible output format is JSON, and not XML. However, JSON can easily be transformed to XML with online tools such as https://www.freeformatter.com/json-to-xml-converter.html
Comments