An intriguing twitter post from Imants Zudans tells us: “Apparently 2 companies are selling atomic hydrogen. I wonder what is the packaging and how it is shipped.”

Zudans provides a link to his MolPort site where you can see for yourself that on the entry for atomic hydrogen (the proton, in other words), Apollo Scientific and Sigma Aldrich are both offering this stuff for sale. Sigma offers no price, but Apollo asks visitors to enquire. Very odd, but they’re obviously just covering all bases for customers searching for hydrogen.

But, it wasn’t the fact that atomic hydrogen is being offered for sale that was most curious it was the incredible difference between the SMILES string and the INCHIkey for the hydrogen atom that caught my eye. Those abbreviated forms of chemical formulae notation that both allow three-dimensional or even two-dimensional flat structures to be represented as a one-dimensional text string are incredibly useful, but I’ve never used either for the hydrogen atom. Nevertheless, the difference between the two is rather amusing:

Hydrogen atom in SMILES = [H]


Did I say the InChI Key was an abbreviated form at 25 characters and the SMILEs at three, where H by itself would just about do it for most people.

  1. An interesting observation, David. It is good that InChI Key is long – it is unique and allows to search the Interenet well. Information search would get a lot more efficient is everybody would switch from CAS numbers to InChI Keys. If I want to find information on a structure for which I don’t know CAS – I will miss many good information sources. Now if everybody would use InChI Keys situation would be a lot better – I don;t need to know the key, I could use a program (or website) to generate it. The same goes for the information providers – they can simply include the key in their content – no need to contact CAS, wait, pay the fee, wait…

    The “atomic hydrogen” issue is a very good example of what issues databases face with data coming from suppliers. Sigma-Aldrich catalog number is indeed for hydrogen (albeit not atomic). Apollo Scientific catalog record is for PEG 1000. The SD file just contains the wrong structure. Such issues are very common. Group abbreviations cause a lot of problems. Molecule with an ID MolPort-000-140-865 on MolPort website is a good example of this (and we keep it for that purposes). See how the image does not match with the name that is generated from the structure!? Look at the SMILES – CH3B(OH)2 is just a text “note” in the SD file. So chemical search algorithms think it is the methane molecule while the user might be mislead by the picture that it is Methylboronic acid. Toronto Research Chemicals catalog is full of such structures. There is little that databases can do to understand what the real structure is. Please, don’t use abbreviated groups and other software vendor specific features in SD files – it does not help you and it causes many others a lot of problems.

