|Interview by David Bradley
Reactive Profile—Egon Willighagen
Egon Willighagen is one of the new breed of chemists who are using the information tools of our age—the blogs, wikis, and online social media—to further their chemistry and benefit the wider chemical community. By day, he is a postdoctoral scientist at the Wageningen University & Research Center in the Netherlands and cites open source programming as his main hobby. This has led to his participation in, amongst many other activities, Bioclipse, CDK, and Jmol. He runs a blog at http://chem-bla-ics.blogspot.com/ and established the Chemical blogspace site, which collects data from dozens of scientific chemistry blogs and then does useful and interesting things with them.
What are you currently working on in the lab?
Data preprocessing to aid metabolite identification for liquid chromatography-mass spectrometry (LC/MS) and GC/MS data, and software to aid metabolomics data analysis in general. This is as a postdoc at Plant Research International and Biometris, both part of Wageningen University & Research Centre, funded by the Netherlands Metabolomics Centre.
What would be your elevator pitch for your research?
Metabolomics and GC/MS and LC/MS data in particular contains so much information that we currently have to disregard most of the information in the data, just to be able to handle the data. In the process, information required for accurate metabolite identification is often lost too. My work focuses on minimizing this information loss.
What will be the next big discovery in this field?
An accurate view of the thousands (rather than hundreds) of unknown metabolites in plants, where we might not know the chemical structure yet, but we do know what expression they show.
What is still missing from the chemical web that chemists really should have on there?
Semantics. If I open a scientific paper on the web, as HTML or PDF, much of the data is hidden behind a nice looking layout. However, I really want to be able to say to my computer something like: check the paper with this DOI, and verify the NMR spectrum (or whatever) of that compound, and tell me what you know about that compound (and close lookalikes), because I do not find the data very convincing.
In what ways are blogs, such as your own, an enabler for chemistry?
Because they are a fast communication protocol. Things that took years in the past, are 'solved' via blogs in a matter of days. Take for example the story about CAS registry numbers in Wikipedia. It is an open platform, where everyone has the opportunity to express their opinion and provide the necessary arguments and details.
What was the inspiration for setting up Chemical Blogspace (Cb)?
Blogs in themselves do not provide summaries of what is happening in (part of) the blogosphere. The software used for the Postgenomic.com (Pg) site provided such functionality, and Euan Adie was happy that I'd set up a chemical corner. One additional reason for the fork was my interest in extracting molecules being discussed as well, in addition to what Pg did itself. Another reason to have a separate corner was branding. I was now able to name it 'Chemical Blogspace', referring to the vast 'chemical space' where us chemists find our ways in the wide range of chemical properties. Cb looks like an element symbol, so it was easy to pick as a logo.
What have been the highpoints of that site and the low points?
I still like very much how Cb picks up molecules being discussed, using a few different types of semantic markup, including RDFa, microformats and Wikipedia identifiers. RDF is really picking up on the internet, and RDFa is the HTML inline version of that. This makes Cb ready for Web 3.0...
Running the software is another story... the database grows quite quickly, and I do not have time to clean up the SQL. Additionally, Euan and I have not been able to merge our code bases again, so we cannot benefit from each others improvements... that's a real shame.