In 2016, Leah McEwen and I (1) wrote an article about how the IUPAC Gold Book (2) was an exemplar digital asset for IUPAC. As we come up on the centenary of IUPAC, here is a brief discussion of the new Gold Book website, how the Gold Book fits this ‘exemplar’ moniker even better now, and the evolution of the Gold Book over the next few years.
1-- Toward a Chemical Ontology for the Next 100 Years of Chemistry
2The History of the Gold Book
The Compendium of Chemical Terminology was published by IUPAC in hardcover (3) in 1987 and contain internationally accepted definitions for terms in chemistry. Victor Gold was the editor of the first edition and was thus colloquially called ‘the Gold Book’. A second hardcopy edition (4) edited by A. D. McNaught and A. Wilkinson, was published in 1997.
An XML version of the Gold Book was created as part of IUPAC project “Standard XML data dictionaries for chemistry” (5), by Miloslav Nic, Jiri Jirat, and Bedrich Kosata. The content was based on the online PDF version of the Second edition of the IUPAC Gold Book. In the process of conversion to XML format, many errors and inconsistencies were fixed. The XML version of the Gold Book also contains more than 500 new entries added by Aubrey Jenkins. It was available online from 2005 till June 2019.
3Impact of the Gold Book
The impact of the Gold Book on society is difficult to measure due to a lack of information of usage of the Gold Book website prior to 2017. However, recently Crossref has made available access to a service, Crossref Events (6), where publishers of Digital Object Identifiers (DOIs) can get access to where their DOIs are being reported. Data obtained for Wikipedia events (edits to pages where a DOI was added – all languages), indicates that Gold Book DOIs (each entry has a DOI) have been added for 1952 entries (~30% of current entries) as well as the overall Gold Book being referenced 2485 times. Looking deeper these entries are referenced in 7581 Wikipedia pages across 84 countries. In the same timeframe, there have been 271 tweets on Twitter (7) that reference a Gold Book entry.
As for the usage of the Gold Book, website access statistics captured using Google Analytics and DataStudio (8) since March of 2017 show interesting demographics (See Figure 2.). The user age pie chart (bottom left) shows that the majority of users are 18-24 years old, and nearly 80% of users are under 35. Likely these statistics indicate that primary use of the Gold Book is for educational purposes at either undergraduate or graduate level. The majority of users (66.0%) come to only one page and then leave (that is considered a ‘bounce’), however the other 34.0% look at an average of eight different pages). In the future these statistics will be looked at in more depth, in order to get an even more detailed idea of the community application of the Gold Book.
4Current status of the Gold Book
Project 2016-046-1-024, “Backup, Maintenance, and Redevelopment of the IUPAC Gold Book Website,” has been completed and the new version of the website (Figure 1.) is now available. The new version of the website does not alter any of the content of the definitions (except for very minor typographical errors); it only alters the way in which that content is delivered. This modernization of the Gold Book website brings the site’s technology up-to-date (a dynamic database-driven website rather than static HTML pages) and opens-up access to the content for computers as well as humans. This is accomplished via the Application Programming Interface (API) endpoints (9) for the Gold Book terms and original sources ( IUPAC scientific journal Pure and Applied Chemistry (PAC) recommendations). As usage of the API increases, the API will be improved based on requests for both the term data/metadata and the data sources.
In addition to the website itself, the project took a holistic view of the Gold Book as an IUPAC digital asset and therefore, as a consequence, issues around long-term support of the site are being addressed by hosting copies of the code base and database on the repository service GitHub (10). In this way, notes on the development of the Gold Book website, documentation of the code, and tracking of code/HTML issues, and their resolution, will be available long-term. In addition, hosting the project on GitHub allows collaboration on the future development of the website, and thus I would encourage anyone interested in being part of the development team to contact me ([email protected]).
5The Gold Book as an Exemplar
In the article that Leah and I wrote, we discussed why the Gold Book is the exemplar digital asset for IUPAC – its contents are a representation of the authoritative chemical knowledge about concepts in the chemical sciences as delineated in the sources of the terms – PAC recommendations. Having gone through the redevelopment of the website and, as a consequence, having read many of the approximate sixty-five hundred current definitions, I can clearly attest to the quality and portent of the content. This, combined with the complexity of the content, has evolved ideas around the Gold Book scope and how it might be repurposed – making the Gold Book even more of an exemplar.
Before I get into talking about the future, I would like to discuss the issues surrounding the complexity of the content. Coming into the project, the understanding of what constitutes a Gold Book term was drawn from the existing IUPAC guidelines for the development of technical reports and recommendations (11). The structural information of a terminology entry (section 6) is reproduced below (annotations omitted):
This outline (with the referenced discussion points) gives an excellent picture of how authors should craft revisions to existing terms or new terms. However, the complexity of the terms on the Gold Book website is significantly higher due to issues such as:
- The same term referenced by acronym or synonym and name (duplication)
- Multiple definitions of the same term in different sub-disciplines of chemistry
- Versioning of terms, needed when content is corrected or links are added/removed
- Terms including the definition of other terms
- Multiple concepts (or concept groups) defined in the same term
- Management of digital content (non-text) i.e. graphics, equations/symbols (now done in latex), and chemical structures
As a consequence of the issues above, the organization of the Gold Book’s terms in the new system has required that data be spread across many database tables in the following (unoptimized) general ‘schema’:
Much of the structure shown above was developed through analysis the original Gold Book website pages (the HTML) and iteration of the code, such that taking the data out of the tables allowed accurate representation of the page content. Clearly, the data model for a Gold Book term is more complicated on the website than for an individual term in a PAC recommendation, however they are very much aligned as one would expect.
A consequence of the time spent working ‘pulling apart’ Gold Book terms and organizing them in the database was the observation of how much the data is aligned with the ontological representation of a concept. This was eluded to in the first paper where a mockup of a term entry in Protégé (12) was shown (Figure 3. in Ref. 1). Below (Figure 3.) is an updated version of that graphic with better ontological annotation of the metadata associated to the term (absorbance).
The idea that any/all Gold Book terms can be represented as ontology entries brings a number of important questions/thoughts to the forefront.
- Are the current guidelines for creating terminology entries comprehensive and clear enough to make the conversion of a term to an ontology entry relatively easy?
- How does IUPAC put a process together to ensure that Divisions can create, update, or remove term definitions easily and in an expeditious manner?
- Given that a term defined by one Division may need to be referenced in a term from a different Division, how can that be enabled?
- How can the representation of chemical entities referenced in terms be made more searchable and more contextually-relevant?
- What criteria should govern entries being included in the Gold Book, and potentially in an ontology, moving forward?
6Relevance to the Future of Chemistry: Chemical Concepts for the Next 100 Years
The digital organization of content of the current Gold Book presents an opportunity to repurpose the Gold Book for the next 100 years of chemistry. Many times, in the recent project I have been amazed to find that a particular chemistry concept is not present in the Gold Book (e.g. boiling point and melting point). Gaps in the Gold Book coverage have also been noticed by members of the community, especially those interested in the semantic representation of chemical information – where formal definitions of concepts are a necessity.
The Compendium of Chemical Terminology was not intended to be a comprehensive resource of chemical concepts, rather it emerged as an editorialized subset that at the time of its creation was a huge advance for communicating chemical concepts to the burgeoning Internet audience. However, with advances in technology, it is now possible to conceive that the Gold Book could/should be a complete set of defined concepts in chemistry. As a result, the next phase of the Gold Book redevelopment is a much bigger endeavor.
The combined set of terms from all currently published PAC recommendations may easily total thirty thousand, over four times the current content of the Gold Book. If all of these terms could be extracted from the PDFs they are currently locked in, the Gold Book would be “The” Compendium of Chemical Terminology. In order to accomplish this lofty goal, a new IUPAC Project has been proposed to develop an online recommended term management system (RTMS).
Such a system would allow Divisions to manage the creation, revision and deprecation of terms from all their PAC recommendations using a single online interface. It would allow committee members to comment on definitions (and keep a record of the comments), open terms for public review (as required), approve terms, view the current status of all terms, and eventually automatically create a PAC recommendation document for publication in PAC from a set of defined terms. Practically, the interface will manage the addition of synonyms, acronyms, and abbreviations, enter the context of terms, add equations and symbols, and link out to other terms, not only within the Division, but also across Divisions. Combined with this, once a term is approved it will appear in the Gold Book immediately, with a unique DOI. An added benefit of such a system is that it would change the timeframe for the development of new, or the updating of existing, recommendations as physical meetings of committee members would not be necessary – all edits and discussion could happen online.
However, the most important aspect of the system is behind the scenes. As the term is entered online, the data and metadata of each term are automatically associated with a part of the ontology entry described above. As a result, the creation of a chemical ontology containing the approved terms can be built almost automatically. Publication of the IUPAC Chemistry Ontology will be a significant resource built by and for the chemistry community for the next one hundred years.
7A Call to Arms
Building the system described above is one thing, identifying, organizing, prioritizing, and adding these terms the RTMS is quite another. So, concurrently to the new project above, a separate project (running in parallel) is proposed to coordinate Divisions in the process of identifying the PAC current recommendations in that Division, organizing members of the Division to review each recommendation, and assign a priority to each term, decide if a term is good as is, needs minor revision, needs a major rewrite, or is no longer needed (i.e. should be marked deprecated).
Once the review of existing terms is complete, Divisions can then start the process of adding the terms (based on priority) to the RTMS. Training on the new system will be provided to help Divisional “term curators” efficiently and accurately transfer the existing content into the system. Once in the system, these terms can be contextualized with additional metadata (e.g. context, synonym and abbreviation) that is not already present in the current form of the term.
The curation stage will require a significant investment of time by each Division (the nominal goal is to finish the curation within two years). As a result, I encourage anyone in chemistry that would like to help with the curation process to make themselves known to their aligned Division to help in this process. The curation step is separate from the formal term definition process and so anyone can participate, especially those that have not been part of recommendation project. This is an opportunity to contribute to chemistry as part of a project that will have a significant impact on IUPAC over the next one hundred years.
8Final Thoughts
I feel very fortunate to have been in the right place at the right time relative to the redevelopment of the Gold Book. It’s not often that you get the opportunity to make a significant and highly impactful contribution to the chemistry discipline. The repurposing of the PAC recommended terms into a chemical ontology, via a highly expanded Gold Book, is an opportunity for many others in chemistry to support the discipline and recognize the contributions of those that have developed recommended terms for chemistry. I hope you will all consider being part of the effort.
References
- The IUPAC Gold Book: An Exemplar for IUPAC Asset Digitization” Stuart J. Chalk, Leah McEwen, Chemistry International, Volume 39, Issue 3, Pages 25–30, ISSN (Online) 1365-2192, ISSN (Print) 0193-6484 - https://doi.org/10.1515/ci-2017-0307
- The IUPAC Compendium of Chemical Terminology (Gold Book)” - https://goldbook.iupac.org/
- The IUPAC Gold Book, Victor Gold, 1st Ed., 1987 (ISBN 0-63201-765-1) -
- The IUPAC Gold Book, A. D. McNaught and A. Wilkinson, 2nd Ed., 1997 (ISBN 0-63201-765-1) -
- Standard XML data dictionaries for chemistry, Steve Stein, IUPAC Project 2002-022-1-024 - https://iupac.org/project/2002-022-1-024
- Event Data, Crossref.org - https://www.crossref.org/services/event-data/
- Twitter - https://twitter.com
- Google DataStudio - https://datastudio.google.com/
- The Gold Book API - https://goldbook.iupac.org/pages/api
- The International Union of Pure and Applied Chemistry @ GitHub - https://github.com/iupac
- Guidelines for Drafting IUPAC Technical Reports and Recommendations - https://iupac.org/what-we-do/recommendations/guidelines-for-drafting-reports/
- Protégé Ontology Editor - https://protege.stanford.edu/