XML and Exciting Data Format

by Evgeny Blokhin for exciting boron

Purpose: In this tutorial you will get a closer look to XML standard and accompanying tools, as well as to exciting XML format.


0. What is XML?

XML (eXtensible Markup Language) is a convenient and self-explanatory presentation (markup) of data for both humans and computer programs. This principle fits perfectly to materials science and physics domains, where the data are crucial to be unambiguous, clear and easy to process and alienate.

XML distinguishes data and metadata. The metadata are the shape of the data or the data about data (μετά means adjacent in Greek). Let's have a closer look.


1. XML input of exciting

The typical input for an exciting calculation is provided below. The metadata are elements (called tags) in the brackets. Data are surrounded by metadata, so one can distinguish and systemize the information. Note the first 3 lines announcing tech info about used XML dialect, stylesheet and schema (we will return to them later).

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="$EXCITINGROOT/xml/inputfileconverter/xmlinput2html.xsl" type="text/xsl"?>
<input xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="$EXCITINGROOT/xml/excitinginput.xsd">
 
    <title>Lorem ipsum dolor sit amet...</title>
 
    <structure speciespath="$EXCITINGROOT/species">
        <crystal scale="7.7201">
            <basevect>0.5 0.5 0.0</basevect>
            <basevect>0.5 0.0 0.5</basevect>
            <basevect>0.0 0.5 0.5</basevect>
        </crystal>        
        <species speciesfile="Ag.xml">
            <atom coord="0.0 0.0 0.0"/>
        </species>        
    </structure>
 
    <groundstate maxscl="50" ngridk="6 6 6" rgkmax="7" xctype="GGA_PBE"/>   
 
    <relax/>
 
    <properties>
        <dos/>        
        <bandstructure>
            <plot1d>
                <path steps="100">
                    <point coord="0.00 0.00 0.00" label="GAMMA"></point>    
                    <point coord="0.50 0.00 0.50" label="X"></point>
                    <point coord="0.50 0.50 0.50" label="L"></point>
                    <point coord="0.50 0.25 0.75" label="W"></point>
                    <point coord="0.00 0.00 0.00" label="GAMMA"></point>
                </path>
            </plot1d>
        </bandstructure>
    </properties>
 
    <!-- this is just a comment, will not be parsed -->
</input>

You probably noticed there are different ways of framing the data with metadata, such as with opening and closing tags (pay attention to the position of the forward slash):
<metadata>data</metadata>

by attributes (about is topic), that opening tag may have:
<metadata about="topic">other data</metadata>

or by tag without contents (may also have attributes):
<othermetadata related="other topic"/>

The meaning of certain tags and attributes is given in Input Reference. Assuming you have no preliminary knowledge about exciting input, isn't it quite clear?

If you are familiar with HTML, you also already may have noticed the similarity. Indeed, HTML is a subset of XML, suited for information markup in web-browsers.

XML is the only way to define an input for exciting calculation, whereas the output data are generated in different formats: XML, binary and plain-text for compatibility. Note, that every exciting input always contains at least structure and groundstate elements inside the input root element.

XML provides the key abilities drawing its popularity: schema validation and stylesheet transformation. Let's get acquainted with them.


2. Validation of XML

Every XML file implies the convention (i.e. schema) on how the certain XML tags inside should be used (e.g. <relax> but not <optimisation>). This schema was determined by the third line of our example XML file above (xsi:noNamespaceSchemaLocation attribute). We can automatically validate any XML file according to the schema it complies: this is a built-in XML feature. Schemas are given by the schema-definition XSD-files, for exciting input this is excitinginput.xsd file.

On UNIX validation may be performed with a command-line tool called xmllint, which is a part of libxml library. Normally libxml is available on most Linux installations and also on Mac OS X. Find out for your case by typing on the command line:

xmllint

If you see the error message, you probably need to install the library. Otherwise, go on: set environmental variables needed for exciting, save the XML input above as input.xml and type:

xmllint input.xml --schema $EXCITINGROOT/xml/excitinginput.xsd --noout

You are expected to get a message:

input.xml validates

Congratulations: you just assured that you are dealing with absolutely valid XML input of exciting. Now try to spoil the input somehow, e.g. simulate a typo: is the input valid anymore?


3. Transforming XML

Any XML document can be conveniently transformed into another type of document (plain-text, HTML webpage, other XML, PDF, SVG etc.) in order to be printed, rendered on the screen, parsed by the other programs and so on. The mechanism for that is called XSLT (eXtensible Stylesheet Language Transformations). XSLT-parser follows the instructions in XSLT-stylesheet (also called template), describing what to do with certain tag in XML:
xslt.png

Basically, the usual web-browser can be also thought as an XSLT-parser, rendering HTML code (subset of XML) at the screen with some pre-defined templates.

Using xsltproc parser of the above-mentioned libxml on UNIX you may transform our XML input into HTML webpage applying xmlinput2html.xsl template:

xsltproc $EXCITINGROOT/xml/inputfileconverter/xmlinput2html.xsl input.xml > input.html
XSLT_workflow.jpg

The main deal in the XSLT transformation is to provide an XSLT-template, which is a XML-like file with .xsl extension. Lots of XSLT-templates are already shipped with exciting: check out the $EXCITINGROOT/xml folder.

A Template Market for various processing of exciting XML data is also at your service!

For example, try to convert your input into the 3D-graphic format .xsf (originally of XCrySDen viewer, but now supported by other common visualisation software in material science, e.g. Vesta and Jmol):

xsltproc $EXCITINGROOT/xml/inputfileconverter/xmlinput2xsf.xsl input.xml > input.xsf

If you feel comfortable with programming, you will also find quite easy to deal with XML (parse, generate, validate, transform etc.).


4. Programming XML

Every programming language either has an interface to the above-mentioned libxml library or implements its own XML parser.

On an example of Python we will perform XML validation and transforming described above. Python has yet even several interfaces to libxml, and that called lxml is used below. On UNIX this chain of tools (i.e. Python interpreter + libxml + lxml) is likely to be present already in your system, for Windows there is a great portable Python pack including them.

Validation of any XML file according to some schema can be made like below (script validation.py):

import os, sys
from lxml import etree
 
try: workpath = sys.argv[1]
except IndexError: sys.exit('No file defined!')
if not os.path.exists(os.path.abspath(workpath)): sys.exit('Invalid path!')
 
schema_path = os.environ['EXCITINGROOT'] + '/xml/excitinginput.xsd' # provide your path to schema
schema_file = open(schema_path).read() 
schema_xml = etree.XML(schema_file)
schema_parser = etree.XMLSchema(schema_xml)
xml = etree.parse(os.path.abspath(workpath))
schema_parser.validate(xml)
 
print 'Valid!' if not schema_parser.error_log else schema_parser.error_log

and transformation (script transform.py):

import os, sys
from lxml import etree
 
try: workpath = sys.argv[1]
except IndexError: sys.exit('No file defined!')
if not os.path.exists(os.path.abspath(workpath)): sys.exit('Invalid path!')
 
xslt_path = os.environ['EXCITINGROOT'] + '/xml/inputfileconverter/xmlinput2xsf.xsl' # provide your path to template
xslt_rule = open(xslt_path).read()
xslt_rule = etree.XML(xslt_rule)
xslt_transform = etree.XSLT(xslt_rule)
xml = etree.parse(os.path.abspath(workpath))
result_tree = xslt_transform(xml)
print str(result_tree)

Run these scripts over your input:

python validation.py input.xml
python transform.py input.xml

and assure the tasks we've done before by typing in command line can be performed by these scripts.


More advanced example can be given, using ASE (Atomic Simulation Environment) Python library interfaced with exciting.

ASE is an external library, requiring an installation (although it is possible to save it simply to a subfolder and import from there).

First, make sure you have it working:

python -c "import ase"

and proceed only if you see no errors.

With the aid of ASE syntax, we define below, among others:

  • cmd variable, which points to exciting launch command (edit accordingly),
  • the atomic structure (tetragonal tausonite perovskite crystal, crystal_obj variable),
  • exciting calculation setup along with the desirable band plot (calc_obj variable).

Then the script generates needed XML input automatically with an immediate launch of exciting.

import os, sys, time
from ase.units import Bohr
from ase.lattice.spacegroup import crystal
from ase.calculators.exciting import Exciting
 
starttime = time.time()
curdir = os.path.realpath(os.path.dirname(os.path.abspath(__file__)))
calcdir = curdir + '/calc_%s' % time.strftime("%m%d_%H%M")
 
cmd = "/usr/global/mpi/bin/mpirun -np 4 " + os.environ['EXCITINGROOT'] + "/bin/excitingmpi" # edit accordingly!
 
crystal_obj = crystal(
    ('Sr', 'Ti', 'O', 'O'),
    basis=[(0, 0.5, 0.25), (0, 0, 0), (0, 0, 0.25), (0.255, 0.755, 0)],
    spacegroup=140, cellpar=[5.511, 5.511, 7.796, 90, 90, 90],
    primitive_cell=True)
 
rmts = {'Sr':1.6*Bohr, 'Ti':1.6*Bohr, 'O':1.6*Bohr}
ase_extens = []
for i in crystal_obj:
    try: ase_extens.append(rmts[i.symbol])
    except KeyError: sys.exit('Rmt for %s is not provided.' % i.symbol)
crystal_obj.new_array('rmt', ase_extens, float)
 
calc_obj = Exciting(
    dir=calcdir,
    bin=cmd,
    speciespath=os.environ['EXCITINGROOT'] + '/species',
    paramdict={"title":{"text()": "Tausonite"},
    "groundstate":{"xctype": "GGA_PBE", "gmaxvr": "13", "epsengy": "1d-5", "maxscl": "75", "fracinr": "2d-2",
    "SymmetricKineticEnergy": "true", "lorecommendation": "false", "ngridk": "2 2 2", "rgkmax": "5"},
    "properties":{
    "dos":{},
    "bandstructure":{
        "plot1d":{
        "path":{
        "steps": "75",
        "point":[
        {"coord":"0.0 0.0 0.0", "label":"GAMMA"},
        {"coord":"-0.25 0.75 -0.25", "label":"P"},
        {"coord":"-0.5 0.5 0.0", "label":"X"},
        {"coord":"0.0 0.0 0.0", "label":"GAMMA"},
        {"coord":"0.5 0.5 -0.5", "label":"Z"}, # points introduced by Heifets
        {"coord":"0.25 0.75 -0.5", "label":"Q"}, # http://dx.doi.org/10.1088/0953-8984/18/20/009
        {"coord":"0.0 0.0 0.0", "label":"GAMMA"},
        {"coord":"0.0 0.5 0.0", "label":"N"},
        ]}}}}})
 
crystal_obj.set_calculator(calc_obj)
energy = crystal_obj.get_potential_energy()
 
print 'Energy =', '%15.8f' % energy, ' eV'
print "Done in %1.2f sc" % (time.time() - starttime)

Run the script and pay attention, how all the needed XML files are generated on-the-fly!
For instance, upon finishing the run, the resulting band structure (bandstructure.xml) in the calculation folder can be visualized with the following command:

xsltproc $EXCITINGROOT/xml/visualizationtemplates/xmlband2agr.xsl bandstructure.xml

Happy XML-ing with exciting!

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License