Interview with Robert Parker

Interview by David Bradley

ISSUE #62
February 2007

Robert Parker

Robert Parker is the recently appointed Managing Director of RSC Publishing, the journals, books, and databases section of the Royal Society of Chemistry and other products for the chemical science community. Dr Parker has worked for the RSC for 22 years and now leads the 185 publishing staff, including those in the society's Library & Information Centre and IT departments. He is a graduate of King's College London, having received his PhD in chemistry in 1985. David Bradley discussed the future of chemistry publishing with Dr Parker for Reactive Reports, with technical information assistance from Richard Kidd.

What do you see as the major changes affecting chemists of the electronic publishing revolution?

Probably the biggest changes for chemistry itself are still to come, but as publishers we've had our pool of possible reviewers open up worldwide, and our authorship has become even more international. There have obviously been infrastructure changes and multiple formats, and increased healthy competition across publishers.

Why have the physics and bio-communities adopted more sophisticated online technologies like pre-print servers and other complex databases before chemists?

I suppose chemists could get by with what they had; whereas the biomedical community has enormous amounts of data which required processing, and the physicists just clicked with the technology and a particular application of it, as heavy users of TeX. However, all of science is moving towards being more data-rich, and publishers, such as the RSC, need to make changes to accommodate that process and facilitate the use of this data.

Do you see the RSC as embracing wikis, web 2.0, blogs, and podcasts, etc.?

In one sense, the RSC and other publishers have always used user content and had feedback loops in place, just as web 2.0 sites do. But, now we are trying these new applications if they are appropriate. Many of the Web 2.0 apps involve users giving their time on a single site because they think they'll get something back. I doubt the RSC has the unique user base to try something like the Connotea social scientific bookmarking site, but experimenting with the technology, such as a specialized wiki (we have CrystEngWiki for this focused community to work out definitions of terms), or adding forum functionality to Faraday Discussions, adds value to what we're already doing. Similarly, a blog for Chemistry World allows the staff to report from conferences in a different style while also allowing readers to comment on the magazine's news stories. But we're unlikely to build big user-centered apps, as frankly we don't think they'll actually get used by a large number of people.

The Wiki concept is essentially what Tim Berners-Lee, the inventor of he World Wide Web, was after; why has the RSC not pioneered this kind of approach?

Well, once an idea succeeds then everyone takes notice. I think with a wiki it's just the technology that's made it easy to quickly do something new, and they're great to use. But again, it'll probably work only in specific circumstances or environments.

The blogosphere seems to grow exponentially, how might chemists use those resources more effectively without succumbing to information overload?

I don't know. Maybe blogs will just die out after the initial enthusiasm, or they'll continue, become packed with chemical markup language (CML) and be held together by semantic really simply syndication (RSS) glue. It's hard enough already keeping track of the existing blogs.

The RSC is pioneering a new publishing model, Prospect, that will encompass more of the latent meta data inherent in chemistry, how exciting is that?

Very exciting—take a look at the Prospect site! As a publisher, we're identifying compounds and ontology terms within our research papers. At the moment you need to scan the paper or a contents list to see what it is about, and new compounds often aren't named within a paper (given simply as "compound 13b", say). Google won't help you there, and neither will Chemical Abstracts Service (CAS) before the new compound has been indexed. By adding the necessary information, and putting it out in RSS feeds, we've enabled computer discovery of relevant papers, specific identification of new compounds, etc.

We're using specific ontologies at present, but the concept is applicable across all subject areas with their own special requirements. The semantic web for chemistry is a nice endpoint to aim for, and the RSC's Project Prospect demonstrates some of the possibilities. We recognize that everyone needs to join in for this to happen.

Why should chemists adopt the likes of InChI and CML?

We needed a meaningful way of identifying compounds uniquely and one that's machine readable—InChI fits the bill. Similarly, CML offers us a way of structuring lots of the science within a paper to both preserve the original science and do interesting things with it, and by demonstrating some of these applications, we hope to encourage wider adoption.

How might this new approach tie in with other efforts, such as PubChem?

Depends if you mean formally or otherwise. We'll certainly link to PubChem in time, our view is that the more papers that are linked, the higher their visibility is and the more useful they become to the reader.

Can you give us an example of how your approach will benefit those readers?

In the short term, readers can see a definition of terms within the articles and link directly to other articles which are related by that concept; if you try this feature, it is incredibly useful. Similarly, they can see structures of important compounds and also link to other RSC papers which contain that compound. CML is available to download if they want to build up their own compound databases.

In the longer term, it should be easier to come in from outside and really hit the articles of interest, rather than trusting that the right combinations of words will work in a search engine. Linking to external databases and datasets will bring much more relevant and associated information alongside the article. And we'll be doing it for them!

Can you be more specific?

Take our publication, Natural Products Reports (NPR), for instance, lots of new compounds really only previously published as static images of structures that a reader would have to read in great detail. Now, the compounds are identified by InChI and the papers have biological activity terms applied to them, which will obviously be of interest to pharma companies, for instance.

How will all this benefit the wider chemistry community rather than simply RSC journal subscribers?

For example, by applying the biomedical ontologies (OBO), we're identifying relevant biological material in a whole spread of RSC publications, and we'd expect this to make our authors' papers more easily accessible to readers who might not have looked at us before. The same will apply to other subject areas when we apply new ontologies to those, and it pulls down a lot of the subject and journal title barriers, as you suggest. Encouraging authors to preserve and submit more of their original data in structured form is a good thing for chemical science and the readers. When it's available, other researchers will find new applications or ways of visualizing it, and we would like to help promote all this.

Is this new RSC approach unique?

The big step we've taken is in scale. It's been built into our production system, so it won't be long before all papers are being enhanced (unless they really wouldn't benefit). Having 5000 papers a year rolling off the production line, with the InChIs and ontology terms being exposed for storage and query, is a pretty big deal. A lot of these concepts have been around for a while, but the really interesting part has been the RSC taking this leap of faith and building these into routine production for all its journals.

Why do you see it as a leap of faith?

CML has been around for a while, and, as you point out, the ideas behind all this aren't new. Publishers haven't done things with these so far on any scale, but we feel that by introducing these concepts to possibly the majority of chemical scientists, we can help convince them of their usefulness. And hopefully drag some other publishers along with us so these standards are more widely used, benefiting the chemical sciences more widely, remembering that an important part of the RSC's mission is to disseminate information.

Someone needed to be the first to take the plunge and we thought we could do something really groundbreaking—and it's a big idea to implement without breaking an already very efficient publication system and will require a significant continuing investment.

So, you're not jumping on the bandwagon of Web 2.0 etc?

We're only doing the web 2.0 stuff where we think there are real applications. Project Prospect papers are enhanced by us for the readers, so active input (in terms of users having to do more than click) isn't an issue. By doing what we have, we've made the online articles more useful and informative to the general reader, packing features in which are incredibly useful if the reader wants to adopt them, but are still helpful if they just want to find out the context of a term or what a compound looks like.

We have already had some very positive feedback since the launch. Ed Pentz of CrossRef, for instance, reckoned "it's brilliant—I've just seen the future of the journal". I've even shown these enhanced articles to non-chemist friends and friends of friends, and they've been really impressed in a way I hadn't anticipated. Just adding these term lookups and associated compound information, like structures, makes the articles instantly more accessible, so it's not just bells and whistles we're adding for the hardcore that are familiar with these concepts. That readers are already using these features in a different way to what we expected is great, and just what we wanted to happen.

In terms of technology, how will you ensure viable archiving of electronic-only papers, especially given the transience of CDs, DVDs, and other storage media?

Well, all our text since 2000 has been in extensible markup language (XML), so is in a non-binary format that we can convert to pretty much anything. As an example of us actively curating our data, when we did the RSC archive we reprocessed our 1997–2003 data to bring it up to our '2004' document type definition (DTD) so everything is pretty current. Then we have standard graphics formats and the portable document formats (PDF) for which there will be a transfer route of some kind. As you point out, one of the most important things is getting any data off CDs or DVDs onto network storage, so we no longer end up disposing of storage formats we can no longer read.