|Interview by David Bradley
Reactive Profile—Noel O'Boyle
Noel O'Boyle is a Postdoctoral Research Associate in the Development Group at the Cambridge Crystallographic Data Centre in Cambridge, and is interested in drug discovery, protein-ligand docking, cheminformatics, QSAR, and computational chemistry.
What are you currently working in the lab?
I am working on improving the scoring functions used by the GOLD protein-ligand docking software. Specifically, I am focusing on improving results in virtual screening experiments. This is where the computational chemist tries to identify active molecules in a large library of compounds. The approach I've taken is to focus on discriminating active molecules from inactives, rather than on trying to predict the absolute binding affinity, a much more difficult problem (J. Chem. Inf. Model., 48: 1269–1278, 2008).
What would be your elevator pitch for your research?
Improved scoring functions will yield better enrichments in a virtual screen and increase the probability of finding a drug.
What will be the next big discovery in this field and will your name be on the paper?
The release of publicly-available datasets in the last few years (ZINC, DUD, Astex Docking Set) is starting to transform how docking studies are carried out. It's not quite a discovery, but I hope that the next big thing in the field will be that research institutes and pharmaceutical companies will release more data, because improvements in docking software will almost certainly follow.
What is still missing from the chemical web that chemists really should have on there?
It would be great if publishers could take on board some of the work that the community has been doing. We have seen the beginning of this with Project Prospect from the RSC [Read our interview with RSC's Robert Parker]. Myself and others in the Blue Obelisk community have developed user scripts for enhancing web pages with chemical and biological data from other sources. Some of these are simple, such as making all protein databank (PDB) codes clickable links; another one gives links to all blog posts that discuss a particular paper. These are the sort of enhancements that I would like to see publishers trying.
In what ways could blogs, social networking, and other web 2.0 systems and activities be an enabler for chemistry?
I think that blogs in particular have already increased interest in chemistry and allowed others to share in the excitement of discovery. Chemistry blogs are written by people who love chemistry, who actually find it fun, and that comes across. Many of these are written by PhD students, and, for example, they allow undergraduates to see the realities of PhD life. Also, because of the ability to comment, whole communities are being built around such chemistry blogs. Since these communities are composed of people with similar interests, it's possible to get involved in in-depth discussions on the best solvent for a cross-coupling reagent, to discuss which is the best grad school to attend or even to develop collaborations. A list of chemistry blogs can be found on the Chemical Blogspace site created by Egon Willighagen.
What do you think is the next step in the development of open source software for cheminformatics?
Projects such as the CDK, OpenBabel, Jmol, and PyMOL are all open source success stories. I believe that the next step is to focus on interoperability and ease of use. That is, the ability to mix and match different components, and to be able to use workflow packages and scripting languages to access these tools. I am currently working on a library called cinfony which provides an interface to the CDK, OpenBabel, and the RDKit, the three main open source cheminformatics toolkits. cinfony allows the user to 1) use all of these from Python, 2) use them all with the same API (commands), and 3) to interchange chemical data between each of the toolkits. Another initiative in this area is the CDK-Taverna work by Thomas Kuhn and others, which allows the CDK to be used with the workflow package Taverna.