This document describes the Experiment Description Language (EDL), a way of describing scientific experiments. It answers the following questions:
All over the world, scientists and hobbyists are doing research by conducting experiments. In this way, they hope to learn more about some aspect of the world they are interested in. The particular subjects of their interests vary wildly, but the ways they go about studying their subjects show many similarities. This is what we call the scientific method. Experiments form the basis of the scientific method. They are used to acquire data about a particular phenomenon. That data is used to either form an hypothesis, or to test (falsify) it. An hypothesis that fails even a single test, must either be rejected or altered.
An important aspect of scientific progress consists of documenting experiments, and publishing the results. For many areas of interest, these results are published in the form of articles in a small number of well-known, respected journals. The advantage of this system is that one needs to read only a few journals to keep up-to-date in a particular area. Publishing scientific work like this allows other people to confirm someone else's results, and to build upon them. The review boards of the scientific journals make sure the scientific work submitted is of good quality. However, in the current Internet Age, anybody can put up a web site and publish results, without going through the thorough submission process. Readers of those websites have no real way of knowing whether the results are trustworthy, so it becomes more and more important to be able to reproduce experiments in order to confirm the results.
Reproducing experiments performed by others is hard, though. As often, the devil is in the details. It often takes a lot of time and dedication to find out exactly how an experiment was conducted, under what circumstances, using what tools, etc. A lot would be gained if there was a standard way of describing experiments and their results. This document hopes to achieve that through defining a standard way of describing scientific experiments in digital form. We will call this the Experiment Description Language (EDL).
EDL is intented for anyone who is interested in doing research using the scientific method; you don't need to be a scolar working at a university. The main reason for using EDL is when you want to confirm other people's results by redoing their experiments, or when you want to give other people an easy way of confirming your own results. There are several reasons why scientists may want to confirm someone else's results:
EDL can also be used in other ways, as described below.
EDL is an application of eXtensible Markup Language (XML). This makes sense, since XML has become the de facto standard for data exchange. A lot of parsers and other tools are available for processing XML documents (many even for free), making it an easy format to work with. Furthermore, it is a text-based format, that can be understood by both humans and computer programs. Not surprisingly, XML already has found many applications in science, like Mathematical Markup Language, Chemical Markup Language, bioXML, and Artificial Intelligence Markup Language.
The structure of an XML document can be described by several mechanisms. Currently the most common in use it the Document Type Definition (DTD), but XML Schema is closing in as support for it is ever expanding. Both the DTD and the XML Schema for EDL can be downloaded here.
The remainder of this section describes the contents of an EDL document in more detail. We will use the following EDL document as an example:
<?xml version="1.0"?> <!DOCTYPE experiment PUBLIC "+//edl.sourceforge//EDL Version 1.0//EN" "http://edl.sourceforge.net/edl.dtd"> <experiment> <introduction> <description href="http://www.somewhere.org/someexperiment/background.html"> This is an example experiment for illustrating the use of EDL. </description> <hypothesis id="H1" href="http://www.somewhere.org/someexperiment/hypothesis.html"> This document conforms to the EDL standard. </hypothesis> </introduction> <performer> <organization> <name> Some University </name> <address webAddress="http://www.someuniversity.edu"/> <organization> <name> Some Faculty </name> <address webAddress="http://www.someuniversity.edu/somefaculty"/> <person> <name> Some Scolar </name> <address emailAddress="somescolar@somefaculty.someuniversity.edu"/> </person> </organization> </organization> <person> <name> Some Hobbyist </name> <address emailAddress="somehobbyist@yahoo.com"/> </person> </performer> <input> <inputItem id="I1" name="example.xml" unit="XML"> This example EDL document. </inputItem> <inputItem id="I2" name="edl.dtd" unit="DTD"> The DTD that describes the structure of EDL documents. </inputItem> </input> <transformation> <description> To test the hypothesis that this document conforms to the EDL standard, we simply validate this document against the EDL DTD or Schema. </description> <tool href="http://www.somewhere.com/somevalidatingparser"/> </transformation> <output> <outputItem input="I1"> The document is well-formed XML and valid according to the Schema/DTD. </outputItem> </output> <conclusions> <description> All we had to do was to run a validating parser against the document. And guess what? This document is a valid EDL document! </description> <testedHypothesis hypothesis="H1" failed="no"/> <publication href="edl.html"> This EDL document is an example document, as used in the original introductionary text on EDL. </publication> </conclusions> </experiment>
The document starts off with stating that it is an XML document. The second line indicates the Document Type Definition (DTD) to be used to validate this document. The first two lines are standard for every EDL document. They do not contain any data about experiments, but together simply state that we're dealing with an EDL document.
The line starting with <experiment>
shows what we call the root of
the EDL document. This is where the description of the experiment starts. An experiment description
always consists of six items: introduction
, performer
,
input
, transformation
, output
, and
conclusions
. These are described in more detail below.
The introduction
part of an EDL document describes the
reasons for conducting the experiment. It consists of a description
of the
background of the experiment and the hypothesis
that the experiment is supposed
to test (there may be more than one). For both the description and the hypothesis,
href
attributes may be added that point to external documents that provide further
information. For an hypothesis, this may also point to another EDL document, where the
hypothesis is first described (see the output
part of the EDL document
below). In this way, EDL documents can be linked, and complex
networks may be formed that represent the interrelationships between hypotheses.
Furthermore, since EDL documents can also be processed by computer programs, it wouldn't be
too hard to write a computer program that tested these hypotheses together as a
theory.
The performer
part of an EDL document describes the organization
s
and/or person
s that performed the experiment. Organization
elements
may be nested to reflect the organizational hierarchy. For both organizations and persons, a
name
must be supplied. An optional address
may also be specified,
consisting of one or more of the following: physical addresses, e-mail addresses, and web
addresses (URLs). The purpose of the performer
part of an EDL document is to make
sure that the persons that performed the experiment may be contacted for more details, should
that need arise.
The input
part of an EDL document describes the input fed into the experiment.
This may consist of as diverse things as compounds for chemical experiments, parameters for
computer simulations, etc. Each inputItem
may optionally receive a
name
, value
, and unit
. When supplying more than one
input, it is recommended that a name be supplied. Each inputItem
also receives an
id
, so that they can be referred to from the output
part. The
input
part is especially important for redoing experiments that were originally
performed by others.
The transformation
part of an EDL document describes the way the experiment was
performed, i.e. how the input
was transformed into the output
. As
for the input
part, the contents of this part is highly dependent on the area
of interest. For chemical experiments, for example, this part may describe the protocol used.
Further information may be referred to by using the href
attribute on the
transformation
element. For software transformations, the tool
element refers to an executable version of the transformation, i.e. a computer program that
may be fed the input
and will produce the output
. This allows the
experiment to be performed exactly the same way as the original performers did, so
that results can be (almost) instantaniously confirmed. Like the input
part, the
transformation
part is essential for redoing experiments that were originally
performed by others.
The output
part of an EDL document describes the outcome(s)
of the experiment. Like inputItems
, outputItems
optionally receive a
name
, value
, and unit
. They can also be linked to an
inputItem
using the input
attribute. The output
part
should only contain raw data. At all cost, beautifications and/or human interpretations of
that data must be avoided here. Instead, they should be placed in the conclusions
part, see below. The purpose of this distinction is to separate hard
facts that can be verified (i.e. the data in the output
part) from their
interpretations (i.e. the human inferences in the conclusions
part). In case of
disagreement, the facts can easily be confirmed or rejected by redoing the experiment, leaving
only the interpretations open for debate.
The conclusions
part of an EDL document interprets the results
of the experiment
in the output
part. For each hypothesis
in the
introduction
part, there should be a testedHypothesis
in the
conclusions
part. The hypothesis
attribute of a
testedHypothesis
indicates the hypothesis that was tested (remember there may be
more than one hypothesis tested in the experiment), whereas the failed
attribute
indicates whether the hypothesis failed (allowed values are yes
, no
,
and inconclusive
). Based on the results of the experiment, new hypotheses may be
formed, that should be described in newHypothesis
elements. That is also the place
to present revised versions of rejected hypotheses, using the alteredVersionOf
attribute. Finally, the conclusions
part may optionally contain a
publication
element that points to an external publication that describes the
experiment, like an article in a journal, or a page of a website.
The example shown here uses simple inputs and outputs using inputItem
and
outputItem
elements. One can also use inputRange
and
outputRange
elements instead (or in addition to), where an input or output
value consists of an indexed range of values (using inputRangeItem
and
outputRangeItem
elements). This is useful for recording e.g. the temperature at
specified times t=0, t=1, etc. Indexes should be integers, but need not be consecutive. See
the EDL DTD or Schema for a complete
description of the syntax of indexed input and output parameters.
Now that we've seen what an EDL document looks like, it is time to discuss the ways EDL can be used to make the life of a scientist easier. These are described in more detail in the following sections.
The historical reason for EDL's existence is for easily reproducing results obtained by others. The reasons for wanting this have been discussed before. The main advantage of EDL in this area is that its structure forces one to make a lot of things explicit that are sometimes missing or hard to find in published articles. The exact inputs and raw outputs must be described, as well as the method(s) used to obtain the results.
Doing research usually involves putting together your own results with those obtained by others (If I have been able to see further, it was only because I stood on the shoulders of giantsSir Isaac Newton). But with so many scientists working in so many fields, it gets hard to keep track of who has done what, and how. And when you're new to a particular field, how do you know what has been done and what hasn't? This problem could potentially be solved by setting up a library of experiments. With such a library in place, one could search for experiments a lot easier, since an EDL document contains structured information.
Note that there currently are a few of such libraries available, e.g. MedLine. However, these are databases of articles, not experiments. This means that it is usually very hard to search for experiments where e.g. the substance phospholipase A2 (PLA2) was obtained. Searching MedLine for PLA2 gives about 3040 articles (august 2003) that mention this substance, but how do we know if PLA2 was actually obtained in the experiments described in the articles? Since an EDL document explicitly mentions the output(s), the requested information is right there, without the need to read most of an article. Only when the right experiment is found, would you read the accompanying article.
Another minor drawback of currently available libraries is that they usually deal with a single area of interest. Since most scientists work in a single field anyway, that may not be such a big problem. But for interdisciplinary studies or emerging fields this may very well be an issue.
Since an EDL document is also an XML document, the usual considerations for working with XML documents apply. In particular, EDL documents take up too much storage to maintain a large number of them. Also, storing all the information in separate EDL documents makes searching them for information too slow. These disadvantages can be circumvented by storing the EDL documents in a (relational) database. There are many issues to consider with this solution, but that is beyond the scope of this document (see e.g. Bourret).
Since an EDL document is also an XML document, we can use XSL transformations (XSLT) to transform EDL documents into other text-based formats. HTML is particularly interesting, but other formats come to mind, including TeX. The cool thing about this is that the stylesheet is developed just once, and can be effortlessly reused on any new EDL document. This is due to the separation of the data (in the EDL document) from the presentation (in the XSL stylesheet).
A nice application of this technique would be to create web pages containing abstracts of the experiments. An example XSL stylesheet that does this job can be found here; the resulting HTML for our example EDL document is available here.
One of the identifying strengths of XML is that it is a format that is understandable by both humans (because it is text-based) and computer programs (because it is structured). This is also true for EDL, since EDL is an application of XML. So we can write software tools that process information in EDL documents and that do basically whatever we would like. Parsing EDL documents is easy, since a lot of parsers for XML are available, many for free.
Application of this technique is limited only by our imagination. For instance, earlier we hinted at a tool for validating entire theories of interrelated hypotheses. We could also write a tool for searching through EDL documents, as a front-end for the library mentioned above. Another tool could extract the raw output data from EDL documents and process it, e.g. create graphs from it. Or one could create a tool that creates the outline for an article (much like the stylesheet approach mentioned before, but offering more control).
For experiments that are performed by computer programs, like computer simulations, a whole new
class of tools can be envisioned. These tools do not require the EDL document to exist before
they operate, since they can create it themselves. So one could create a tool that performs an
experiment and creates the EDL based on the output, leaving only a few empty elements to be
filled in by the researchers (like the introduction
and conclusions
parts).
Not all tools need to be as generic as the ones discussed above. Surely lots of other tools could be created that work only for a specific team of experimenters, or even only for a specific (set of) experiment(s). The main point is that the information is there, structured and all, ready to be processed.