Experiment Description Language

This document describes the Experiment Description Language (EDL), a way of describing scientific experiments. It answers the following questions:

What is EDL for?
Who needs EDL?
What does EDL look like?
How can EDL be used?

What is EDL for?

All over the world, scientists and hobbyists are doing research by conducting experiments. In this way, they hope to learn more about some aspect of the world they are interested in. The particular subjects of their interests vary wildly, but the ways they go about studying their subjects show many similarities. This is what we call the scientific method. Experiments form the basis of the scientific method. They are used to acquire data about a particular phenomenon. That data is used to either form an hypothesis, or to test (falsify) it. An hypothesis that fails even a single test, must either be rejected or altered.

An important aspect of scientific progress consists of documenting experiments, and publishing the results. For many areas of interest, these results are published in the form of articles in a small number of well-known, respected journals. The advantage of this system is that one needs to read only a few journals to keep up-to-date in a particular area. Publishing scientific work like this allows other people to confirm someone else's results, and to build upon them. The review boards of the scientific journals make sure the scientific work submitted is of good quality. However, in the current Internet Age, anybody can put up a web site and publish results, without going through the thorough submission process. Readers of those websites have no real way of knowing whether the results are trustworthy, so it becomes more and more important to be able to reproduce experiments in order to confirm the results.

Reproducing experiments performed by others is hard, though. As often, the devil is in the details. It often takes a lot of time and dedication to find out exactly how an experiment was conducted, under what circumstances, using what tools, etc. A lot would be gained if there was a standard way of describing experiments and their results. This document hopes to achieve that through defining a standard way of describing scientific experiments in digital form. We will call this the Experiment Description Language (EDL).

Who needs EDL?

EDL is intented for anyone who is interested in doing research using the scientific method; you don't need to be a scolar working at a university. The main reason for using EDL is when you want to confirm other people's results by redoing their experiments, or when you want to give other people an easy way of confirming your own results. There are several reasons why scientists may want to confirm someone else's results:

Validation of a new method. When devising a new method, one would want a benchmark of results for comparison of results acquired using the new method. For example, one could measure the volume of a sphere either by measuring the radius and using a formula to calculate the volume, or by placing the sphere in a liquid and measuring the increased volume. When (approximately) the same results are obtained for both the new and one or more older methods (of course using the same inputs), this is a indication that the new method works properly.
Validation of a new implementation of an existing method. Even when using the same method, different people may obtain different results because they use different tools, or use the same tools slightly differently. For example, scientists working in a laboratory often need to calibrate their instruments. By reproducing someone else's results, they can check whether the calibrations were successful.
Validation of the original method/implementation. When an unknown person/organization publishes a result, chances are that (s)he is not taken serious as easily as when a well-known, respected person publishes them. For example, Mendel's results about inheriting traits were originally simply overlooked. But when other, independent, people reproduce the results, the facts get harder to ignore. Also, when someone publishes a result that is very surprising in the light of current knowledge, it tends to get met with some scepticism. An example of this is found in the reports about socalled cold fusion. In cases like these, other scientists will try to reproduce the results to validate the original claims.

EDL can also be used in other ways, as described below.

What does EDL look like?

EDL is an application of eXtensible Markup Language (XML). This makes sense, since XML has become the de facto standard for data exchange. A lot of parsers and other tools are available for processing XML documents (many even for free), making it an easy format to work with. Furthermore, it is a text-based format, that can be understood by both humans and computer programs. Not surprisingly, XML already has found many applications in science, like Mathematical Markup Language, Chemical Markup Language, bioXML, and Artificial Intelligence Markup Language.

The structure of an XML document can be described by several mechanisms. Currently the most common in use it the Document Type Definition (DTD), but XML Schema is closing in as support for it is ever expanding. Both the DTD and the XML Schema for EDL can be downloaded here.

The remainder of this section describes the contents of an EDL document in more detail. We will use the following EDL document as an example:

<?xml version="1.0"?>
<!DOCTYPE experiment PUBLIC 
  "+//edl.sourceforge//EDL Version 1.0//EN" "http://edl.sourceforge.net/edl.dtd">
<experiment>
  <introduction>
    <description 
      href="http://www.somewhere.org/someexperiment/background.html">
      This is an example experiment for illustrating the use of EDL.
    </description>
    <hypothesis id="H1" 
      href="http://www.somewhere.org/someexperiment/hypothesis.html">
      This document conforms to the EDL standard.
    </hypothesis>
  </introduction>
  <performer>
    <organization>
      <name>
        Some University
      </name>
      <address webAddress="http://www.someuniversity.edu"/>
      <organization>
        <name>
          Some Faculty
        </name>
        <address 
          webAddress="http://www.someuniversity.edu/somefaculty"/>
        <person>
          <name>
            Some Scolar
          </name>
          <address 
            emailAddress="somescolar@somefaculty.someuniversity.edu"/>
        </person>
      </organization>
    </organization>
    <person>
      <name>
        Some Hobbyist
      </name>
      <address emailAddress="somehobbyist@yahoo.com"/>
    </person>
  </performer>
  <input>
    <inputItem id="I1" name="example.xml" unit="XML">
      This example EDL document.
    </inputItem>
    <inputItem id="I2" name="edl.dtd" unit="DTD">
      The DTD that describes the structure of EDL documents.
    </inputItem>
  </input>
  <transformation>
    <description>
      To test the hypothesis that this document conforms to the EDL
      standard, we simply validate this document against the EDL DTD
      or Schema.
    </description>
    <tool href="http://www.somewhere.com/somevalidatingparser"/>
  </transformation>
  <output>
    <outputItem input="I1">
      The document is well-formed XML and valid according to the 
      Schema/DTD.
    </outputItem>
  </output>
  <conclusions>
    <description>
      All we had to do was to run a validating parser against the 
      document. And guess what? This document is a valid EDL 
      document!
    </description>
    <testedHypothesis hypothesis="H1" failed="no"/>
    <publication href="edl.html">
      This EDL document is an example document, as used in the original 
      introductionary text on EDL.
    </publication>
  </conclusions>
</experiment>

The document starts off with stating that it is an XML document. The second line indicates the Document Type Definition (DTD) to be used to validate this document. The first two lines are standard for every EDL document. They do not contain any data about experiments, but together simply state that we're dealing with an EDL document.

The line starting with <experiment> shows what we call the root of the EDL document. This is where the description of the experiment starts. An experiment description always consists of six items: introduction, performer, input, transformation, output, and conclusions. These are described in more detail below.

The introduction part of an EDL document describes the reasons for conducting the experiment. It consists of a description of the background of the experiment and the hypothesis that the experiment is supposed to test (there may be more than one). For both the description and the hypothesis, href attributes may be added that point to external documents that provide further information. For an hypothesis, this may also point to another EDL document, where the hypothesis is first described (see the output part of the EDL document below). In this way, EDL documents can be linked, and complex networks may be formed that represent the interrelationships between hypotheses. Furthermore, since EDL documents can also be processed by computer programs, it wouldn't be too hard to write a computer program that tested these hypotheses together as a theory.

The performer part of an EDL document describes the organizations and/or persons that performed the experiment. Organization elements may be nested to reflect the organizational hierarchy. For both organizations and persons, a name must be supplied. An optional address may also be specified, consisting of one or more of the following: physical addresses, e-mail addresses, and web addresses (URLs). The purpose of the performer part of an EDL document is to make sure that the persons that performed the experiment may be contacted for more details, should that need arise.

The input part of an EDL document describes the input fed into the experiment. This may consist of as diverse things as compounds for chemical experiments, parameters for computer simulations, etc. Each inputItem may optionally receive a name, value, and unit. When supplying more than one input, it is recommended that a name be supplied. Each inputItem also receives an id, so that they can be referred to from the output part. The input part is especially important for redoing experiments that were originally performed by others.

The transformation part of an EDL document describes the way the experiment was performed, i.e. how the input was transformed into the output. As for the input part, the contents of this part is highly dependent on the area of interest. For chemical experiments, for example, this part may describe the protocol used. Further information may be referred to by using the href attribute on the transformation element. For software transformations, the tool element refers to an executable version of the transformation, i.e. a computer program that may be fed the input and will produce the output. This allows the experiment to be performed exactly the same way as the original performers did, so that results can be (almost) instantaniously confirmed. Like the input part, the transformation part is essential for redoing experiments that were originally performed by others.

The output part of an EDL document describes the outcome(s) of the experiment. Like inputItems, outputItems optionally receive a name, value, and unit. They can also be linked to an inputItem using the input attribute. The output part should only contain raw data. At all cost, beautifications and/or human interpretations of that data must be avoided here. Instead, they should be placed in the conclusions part, see below. The purpose of this distinction is to separate hard facts that can be verified (i.e. the data in the output part) from their interpretations (i.e. the human inferences in the conclusions part). In case of disagreement, the facts can easily be confirmed or rejected by redoing the experiment, leaving only the interpretations open for debate.

The conclusions part of an EDL document interprets the results of the experiment in the output part. For each hypothesis in the introduction part, there should be a testedHypothesis in the conclusions part. The hypothesis attribute of a testedHypothesis indicates the hypothesis that was tested (remember there may be more than one hypothesis tested in the experiment), whereas the failed attribute indicates whether the hypothesis failed (allowed values are yes, no, and inconclusive). Based on the results of the experiment, new hypotheses may be formed, that should be described in newHypothesis elements. That is also the place to present revised versions of rejected hypotheses, using the alteredVersionOf attribute. Finally, the conclusions part may optionally contain a publication element that points to an external publication that describes the experiment, like an article in a journal, or a page of a website.

The example shown here uses simple inputs and outputs using inputItem and outputItem elements. One can also use inputRange and outputRange elements instead (or in addition to), where an input or output value consists of an indexed range of values (using inputRangeItem and outputRangeItem elements). This is useful for recording e.g. the temperature at specified times t=0, t=1, etc. Indexes should be integers, but need not be consecutive. See the EDL DTD or Schema for a complete description of the syntax of indexed input and output parameters.

How can EDL be used?

Now that we've seen what an EDL document looks like, it is time to discuss the ways EDL can be used to make the life of a scientist easier. These are described in more detail in the following sections.

Reproducing results

The historical reason for EDL's existence is for easily reproducing results obtained by others. The reasons for wanting this have been discussed before. The main advantage of EDL in this area is that its structure forces one to make a lot of things explicit that are sometimes missing or hard to find in published articles. The exact inputs and raw outputs must be described, as well as the method(s) used to obtain the results.

Maintaining a library of experiments

Doing research usually involves putting together your own results with those obtained by others (If I have been able to see further, it was only because I stood on the shoulders of giants—Sir Isaac Newton). But with so many scientists working in so many fields, it gets hard to keep track of who has done what, and how. And when you're new to a particular field, how do you know what has been done and what hasn't? This problem could potentially be solved by setting up a library of experiments. With such a library in place, one could search for experiments a lot easier, since an EDL document contains structured information.

Note that there currently are a few of such libraries available, e.g. MedLine. However, these are databases of articles, not experiments. This means that it is usually very hard to search for experiments where e.g. the substance phospholipase A2 (PLA2) was obtained. Searching MedLine for PLA2 gives about 3040 articles (august 2003) that mention this substance, but how do we know if PLA2 was actually obtained in the experiments described in the articles? Since an EDL document explicitly mentions the output(s), the requested information is right there, without the need to read most of an article. Only when the right experiment is found, would you read the accompanying article.

Another minor drawback of currently available libraries is that they usually deal with a single area of interest. Since most scientists work in a single field anyway, that may not be such a big problem. But for interdisciplinary studies or emerging fields this may very well be an issue.

Since an EDL document is also an XML document, the usual considerations for working with XML documents apply. In particular, EDL documents take up too much storage to maintain a large number of them. Also, storing all the information in separate EDL documents makes searching them for information too slow. These disadvantages can be circumvented by storing the EDL documents in a (relational) database. There are many issues to consider with this solution, but that is beyond the scope of this document (see e.g. Bourret).

Conversion to other formats for publication

Since an EDL document is also an XML document, we can use XSL transformations (XSLT) to transform EDL documents into other text-based formats. HTML is particularly interesting, but other formats come to mind, including TeX. The cool thing about this is that the stylesheet is developed just once, and can be effortlessly reused on any new EDL document. This is due to the separation of the data (in the EDL document) from the presentation (in the XSL stylesheet).

A nice application of this technique would be to create web pages containing abstracts of the experiments. An example XSL stylesheet that does this job can be found here; the resulting HTML for our example EDL document is available here.

Custom processing

One of the identifying strengths of XML is that it is a format that is understandable by both humans (because it is text-based) and computer programs (because it is structured). This is also true for EDL, since EDL is an application of XML. So we can write software tools that process information in EDL documents and that do basically whatever we would like. Parsing EDL documents is easy, since a lot of parsers for XML are available, many for free.

Application of this technique is limited only by our imagination. For instance, earlier we hinted at a tool for validating entire theories of interrelated hypotheses. We could also write a tool for searching through EDL documents, as a front-end for the library mentioned above. Another tool could extract the raw output data from EDL documents and process it, e.g. create graphs from it. Or one could create a tool that creates the outline for an article (much like the stylesheet approach mentioned before, but offering more control).

For experiments that are performed by computer programs, like computer simulations, a whole new class of tools can be envisioned. These tools do not require the EDL document to exist before they operate, since they can create it themselves. So one could create a tool that performs an experiment and creates the EDL based on the output, leaving only a few empty elements to be filled in by the researchers (like the introduction and conclusions parts).

Not all tools need to be as generic as the ones discussed above. Surely lots of other tools could be created that work only for a specific team of experimenters, or even only for a specific (set of) experiment(s). The main point is that the information is there, structured and all, ready to be processed.