Xml, the extended markup language is a standard for structured documents and data serialisation. It uses the same markup with the angle brackets as HTML. The rules that distinguish a valid xml file from html are quite basic.
- encoding statement in first line
- exactly one root element
- tags and atributes may have any name
the last point make xml a good choice to express any data structure, as well as data from simulations.
The problem is not only representing data in xml, but also getting it out for further processing and visualization. And it is her where xml really plays strong. Xml is a widely used standart, but the standars is not only about the file format itself. There are a lot of tools around it which themselves use standardised languages and interfaces. And they are freely available and mostly already on most computers without being noticed.
The tool that makes xml for science really use full and practically feasible is XSLT. XSLT (XML Stylesheet Language Transformations) solve the task of templating and data extraction in such an elegant way, that awk and grep really look like what they are: 30 years old technology. xsltproc is a tool that implements xslt for the command line, and it is almost as available as grep and awk.
The Important point to realize is that a lot of tasks around computer simulations are of type: templates.
- Setting up a series of experiments
- converting result data to visualizations
are the 2 most obvious examples.
Look at the scenario of visualization. You made a graph from a simulation an work on it that it looks good all the labels are right and the legend makes sense. Then you repeat the calculation and want the same graph with new data.
What you want is a template where you can cast in any data of the same or similar type.
The other example is: You need to repeat the simulation for a number of values o a certain parameter or set of parameters. what you want is a template that generates input files for the simulation for each of the parametersets that you need calculated.
No doubt perl scripts could solve this problem but a dedicated templating and data transformation is just much less friction.
Some of the ideas behind xml in exciting are highlighted in the talk give on the YRM meeting 2009 Berlin of the ETSF
Talk:
XML is an replacement for files that should be machine readable and human readable. There is no gain in trying to but gigabytes of binary float data into xml. Binary formats such as NetCDF are just much more useful there. The XML Tools can process large files but it is just not the tool for large arrays of floating-point data.
On the other hand for manny types of data it is really important that you can get a quick look on the data to decide if it is meaningful or how to process it further.
Hope this can explain the jump on XML. Please comment or ask on the mailing list if points remain unclear.