Introduction of NIST 17—A Major Update of Mass Spectral Libraries and Software—at the 65th ASMS Conference on Mass Spectrometry and Allied Topics

At the 65th American Society for Mass Spectrometry Conference on Mass Spectrometry and Allied Topics, held June 4–8, 2017, in Indianapolis, IN, the National Institute of Standards and Technology’s (NIST’s) Mass Spectrometry Data Center introduced major updates of its Electron Ionization (EI) and Tandem (spectra obtained using MS/MS of LC/MS produced ions) Spectral Libraries and the GC Methods/Retention Index Library, along with new versions of several software programs used in conjunction with these libraries. Additionally, free downloadable spectral collections such as the three EI libraries of Annotated Recurrent Unidentified Spectra (ARUS), a Glycan Library, and a Glycopeptides Mass Spectral Library (HCD) of Human IgG1 mAb Drugs were also announced. This was the single largest increase in data and software ever introduced by NIST.

Before presenting details of this release, a historical overview of the development of the library is presented. Shortly after the discovery that mass spectrometry could be used to identify volatile organic compounds in the 1940s, the building of EI mass spectral libraries began. Some of this early work took place at the National Bureau of Standards (NBS). 1 In the United States, these libraries were maintained for many years by the National Institutes of Health (NIH) and the U.S. Environmental Protection Agency (EPA). As early as the 1970s, the NIH library of 12,500 spectra was made available to mass spectrometry instrument manufacturers through an annual rental from the NBS. The instrument manufacturers would provide this library with their dedicated minicomputer systems. These early search systems only allowed for library spectra retrieval though the submission of an instrument acquired spectrum of the compound.

The EPA and NIH libraries were merged together in ca. 1975, yielding an approximate 25,000-spectrum library that was leased by NBS to mass spectrometry instrument manufacturers in the same way as the NIH library had been distributed. Both the NIH and EPA libraries as well as the combined EPA/NIH library had only one spectrum per compound. This combined library was published as what is known as the Red Books.2 The Red Book contained full spectra, including the isotope peaks when available, along with a chemical structure for each compound. At the time the two libraries were combined, each had separate time-share systems that were accessible through terminals connected to modems using voice-grade telephone lines and a teletype for input/output. Each of these systems was maintained separately by the two agencies. The two systems were also combined and made accessible to subscribers of the Mass Spectral Search System managed by Fine Marquardt, a contractor. 3,4 The EPA system only allowed spectra to be submitted for a search—the only way to review a spectrum from this system was to submit a spectrum for the compound. The returned spectra, in tabular format, contained the compound’s information (name, formula [elemental composition], CAS registry number, if available, and nominal mass). The NIH system enabled retrieval of spectra in tabular format through the submission of any of the various criteria associated with the spectrum. It also allowed for spectral retrieval through the submission of integer m/z values and relative intensity pairs. Multiple such pairs could be used as a search criterion to result in a reasonable number of matches meeting all the entered peak pairs. Neither the EPA, the NIH, or the combined system offered chemical structures. The combined on-line system was much better than either one of the separate systems; however, it was slow and out of the financial reach of many laboratories at that time.

In 1987, NBS was handed stewardship of the combined EPA/NIH Library. At that time, all the instrument manufacturers’ data systems allowed only for spectral retrieval though the submission of an acquired spectrum for that compound. Even though many of these systems would display bar-graph spectra, the spectra had been abbreviated due to limited disk storage space, and none of these systems could display structures. Steve Stein, director of the newly created NBS Mass Spectrometry Data Center, knew that users of the Mass Spectral Libraries wanted more flexibility in the way spectra and other information from the library were retrieved, and they wanted to see the full spectrum, not just a monoisotopic version, as well as structures. Based on the chemical name alone, a large fraction of the entries in the library were almost indecipherable by many users. He wrote a program for use with the IBM PC using the DOS operating system. The first release of this program (v.2.0) was in November 1988. By that time, the EPA/NIH Library had grown to ~49.5K compounds from spectra added by NBS (both acquired from third parties and tabulated from the scientific literature). In 1989, NBS changed its name to the National Institute of Standards and Technology, and the library became known as the NIST/EPA/NIH Mass Spectral Library. In 1990, the DOS program, NIST Mass Spectral (MS) Search Program, (v.3.0) and the library—now including chemical structures and spectra for nearly 54K compounds—were sold directly to end users and were made available through distributors (mostly mass spectrometer manufacturers). The manufacturers continued to provide the library in the condensed spectra format for their proprietary search systems on a per-copy royalty for use on a single computer accessible by a single user at a time, as several still do today. Many also sold the NIST version of the data along with the NIST MS Search Program (MS Search) at no additional charge.

Perkin Elmer (PE) introduced a new GC/MS instrument at the 1990 Pittsburgh Conference. This instrument used Stein’s DOS version of MS Search as it the only way to search mass spectra of unknowns. Spectra were exported to a text format, which could be read by MS Search. When the spectrum was imported, it was searched. The Hit List was a series of structures along with Match Factors and Reverse Match Factors. (A Reverse Match Factor is a Match Factor Calculated ignoring any peaks in the sample spectrum and not in the library spectrum. This novel technique, introduced by Stein in MS Search, allowed for identification of coeluting compounds and was used to compensate for high backgrounds.) I was the new Saturn (a quadrupole ion trap-based GC-MS) product manager at Varian when I saw this display for the first time. I demanded to know how PE was accomplishing this display of such a Hit List. After some insistence on my part, I was told that this was an NIST development. I immediately went to the NIST stand, met Steve Stein, had a lengthy conversation, and by the end of the week had this ability on the Saturn. I was overwhelmed by this Hit List display using chemical structures. This was what I had imagined a library search display should be like. Clicking on a structure resulted in the display of the spectrum. This was the beginning of my now more than 25-year association with NIST as a contractor and colleague. As MS Search has evolved from a DOS program to a Windows program and many advances have been made, the ability to display the Hit List as an array of structures has been retained. Fortunately, the need to install the library from dozens of floppy disks has not.

Stein’s program allowed for several different ways to search for a spectrum, such as an Incremental Name Search. This search allows the name to be typed letter by letter and the spectrum display was refreshed as each letter was typed. Another feature Stein added to the libraries was the inclusion of synonyms. Although the library at that time had only one spectrum per compound, many of the spectra had associated multiple names (synonyms), which included trade and common names. The Incremental Name Search worked not only with the primary name for the compound, but for all of the synonyms.

Spectra could be retrieved by entering the elemental composition (Formula Search), the Chemical Abstracts Registry Number (CAS# Search), the compound’s nominal mass (MW Search), or the compound’s NIST number (formerly called the EPA Number), a unique number given each spectrum considered for entry in to the distributed library. The program also included ways to search submitted spectra in tabular format against the library. Stein used multiple search algorithms, 5 such as the popular INCOS (Dot-Product) algorithm. His version included enhanced presearch features, and all the peaks in the sample spectrum were searched against all the peaks in the subset of spectra resulting from the presearch. At the time, the instrument manufacturers still used reduced number of peaks per spectrum to limit the storage space requirements. These condensed spectra had 16 to a maximum of 40 peaks present. Stein also introduced the use of a neutral-loss search and began increasing the indexing of the library to facilitate various ways of searching spectra of unknown compounds. The program included the Any Peaks Search, which enabled the retrieval of spectra from selected m/z and relative intensity pairs, which was very popular in the on-line search systems. An interactive and considerably expanded version of that search exists today.

As can be seen in Figure 1, the EI Library began a steady growth in 1990. This also coincided with the measurement of spectra at NIST as well as acquisition of several high-quality large mass spectral libraries from other organizations and the continued practice of digitizing spectra from the literature. With the exception of the period from 1993 to 2002 (only two releases in this nine-year period instead of three), there has been a release every three years. With the 1993 release, good-quality replicate spectra of “more often encountered compounds” were added to this EI Library. This was a deviation from the previous policy of having only a single spectrum per compound. The organization of spectra for the NIST MS Search Program involved the placing of the “best spectrum” (based on human evaluation) for each compound into what was called the mainlib (main library), which continued to have only one spectrum and name per compound. The replicate spectra were placed into a separate library called the replib. Both of these libraries can be searched at the same time, and the search can be limited to just the mainlib. Several different instrument manufacturers did not follow this protocol and elected to have only a single library that contained both the mainlib and the replib spectra. When the Hit List obtained using MS Search or one of the instrument companies’ proprietary search program contains multiple spectra of the same compound, the user has more confidence in the result.

Figure 1 – Growth of the Electron Ionization Library over time under the stewardship of NIST.

Not only has the EI Library grown, but the original MS Search has evolved. In 1998, a Microsoft Windows version of the Program was released with a substantial upgrade in 2003. Changes in MS Search continue, resulting from the need for detailed evaluation of the spectra to be included in the NIST/EPA/NIH Libraries. As the number of spectra being added per year increased, so did the need for more computer tools to evaluate spectra. The 1998 release was the first version in which all the spectra were examined by one or more mass spectrometrists. 6 This human evaluation in addition to a computer evaluation continues today.

One of the unique features of MS Search is that it allows for the search of as many as 127 mass spectral libraries at the same time. Due to the fact that virtually all mass spectral libraries are now available in this format, this feature permits the seamless integration of a large fraction of available reference mass spectra. Hits are arranged by best to worst Match Factor, regardless of which library the hit is from. This, along with a multitude of constraints that can be used to control which matches are displayed, are just a few of the outstanding features of MS Search. Another standout aspect of NIST MS Search Program is the ability to customize the display, not only of the program, but individually for each user on the same computer, one at a time.

As the library has grown and the search algorithms have evolved, the need for more complexity in user libraries has arisen. The program has kept up with these needs, and the detailed requirements of user libraries, including the storage of accurate mass, has been greatly expanded. NIST 05 (the number following “NIST” signifies the year of release) was released with a collection of detailed GC Methods and numeric retention-index (RI) data taken from the literature. In NIST 14, the ability to use the RI data in searches of unknown spectra was added. Such RI data can be developed for spectra of unknowns using the NIST AMDIS (Automated Mass spectral Deconvolution and Identification System) Program, 7,8 thus allowing spectra to be sent to MS Search with the RI values of the unknowns that can be compared against the RI data in the Library.

NIST 05 also included a library of product-ion mass spectra (the NIST Tandem Library). These spectra were acquired using both beam-type collision cells and ion traps. There were multiple precursor ions per compound and multiple spectra acquired using multiple collision energies per precursor ion. Both integer and accurate m/z value spectra are included in this library. The growth of this Tandem Library has been phenomenal, as can be seen in Figure 2.

Figure 2 – Growth statistics of the NIST Tandem Library.

MS Search has evolved to allow for more accurate searching of product-ion spectra against those in the NIST Tandem Library. New algorithms were introduced at that time to enable expanded use of the Tandem Library. In NIST 17, additional Tandem Library search features were introduced to allow for searching of MS2 spectra of fragment ions produced using in-source CAD and MSn spectra.9

An important aspect of MS Search that was developed with the NIST MS Search Program for Windows was the ability to link the application with the software of various manufacturers with a minimal amount of programming effort. This was in addition to making available an API (application programming interface) in the form of a DLL (dynamic linked library) for MS Search formatted data so that manufacturers could customize their own data analysis programs with their graphical interfaces while still retaining the features of the NIST MS Search Program algorithms without having to reformat the data in the NIST Libraries.

The 1998 release also was accompanied by the first version of the AMDIS Program. Several releases back, another utility, Mass Spectrometry (MS) Interpreter, was introduced, which has a number of tools that can easily be used for extracting the identification of a compound from its mass spectral data even if a spectrum for the compound is not in the library. All of these features have been retained and enhanced in the current NIST 17 release.

As stated above, many of the features added to MS Search were initially done to aid the process of evaluating spectra. One such need resulted in the MS Interpreter Program. A proposed structure for a compound and its mass spectrum (either EI or a product-ion spectrum obtained using MS/MS of either a precursor ion from EI or formed using an LC/MS technique) is submitted to the program. The spectrum is evaluated to see if the structure makes sense for the proposed structure. There are utilities with MS Interpreter to determine if the isotope peaks are within the intensity range for that of the specified ion (see Figure 3).

Figure 3 – Illustrating the ability to select a portion of a structure in MS Interpreter and show its exact mass along with possible formulas of ions having similar masses. High mass accuracy product-ion analysis has been added to the latest version. MS Interpreter shows the relevance of the spectrum to the associated structure.

The importance of the structure in an evaluation of a spectrum cannot be overemphasized, as indicated by MS Interpreter. Another tool developed to assist the evaluators was added to the 2014 release of the library—the ability for the operator to choose whether a full chemical structure is displayed for a derivatized compound or the display is that of the derivatives precursor along with a symbol for the derivatizing agent (see Figure 4). The inclusion of this structure-display capability is just another example of NIST’s attention to the needs of its users.

Figure 4 – Illustration of different structural displays for derivatives.

MS Search’s Substructure Identification utility analyzes the result of a Neutral Loss Search of a product-ion mass spectrum against the EI Library to identify what substructures an unknown may have. Not only is the probability of the presence of various substructures identified, but the probability of the absence of substructures is calculated and reported.10

The Mass Spectrometry Data Center developed the International Chemical Identifier (InChI) under contract to the International Union of Pure and Applied Chemistry (IUPAC). 11,12 According to Wikipedia, “InChI is a textual identifier for chemical substances, designed to provide a standard way to encode molecular information and to facilitate the search for such information in databases and on the web.” InChI is not “human understandable.” A 27-character “hash” of the InChI, the InChIKey was developed to facilitate the use of the InChI concept. Unlike the InChI, the InChIKey is not unique, and different structures can have the same InChIKey; however, such collisions are very rare. With the release of NIST 11, each spectrum with a structure was assigned an InChIKey. When spectrum/structure pairs are input into User Libraries, an InChIKey is automatically assigned. In the NIST 14 release, when the InChIKey was displayed associated with a spectrum, it became a hot-link to that compound in PubChem if the computer was connected to the Internet. In NIST 17, the first segment of an entered InChIKey can be used as a constraint when doing a search of an unknown spectrum. NIST 17 allows older User Libraries to be indexed to use this new feature. The InChIKey for cocaine can be seen in Figure 5 in the text below the bar-graph spectrum.

Another advantage of the InChIKey in MS Search is the ability to display isotopic variants (e.g., C6D6, deuterated benzene, is an isotopic variant of C6H6, benzene), stereoisomers (E/Z or R/S isomers of the same structure), and/or various derivatives of a compound as replicates of that compound. This is an optional display ability and is very useful because of the lack of CAS Registry numbers for some compounds.

As interest began to grow in the information that could be extracted from product-ion mass spectra and new instrumentation such as the tandem quadrupole/time-of-flight (Q-TOF) and the Thermo Fisher Orbitrap in combination with several precursor ion isolation techniques became more widely used, there was a greater need for the ability to handle accurate mass data. The NIST MS Search Program expanded to meet these new needs. This is also true of the program’s ability to build and have accurate mass User Libraries. The User Library routine in the NIST MS Search Program was expanded to have increased header topics in the Text Information part the spectrum. Not only was the availability of fixed definition fields added, but the ability to add user-defined fields was also incorporated (Figure 5).

Figure 5 – Example spectrum showing increases in header information.

The singularly most powerful tool introduced with NIST 17’s NIST MS Search Program is the Hybrid Search. This search works off of expanded neutral loss indexing. When the search is carried out with aid of a specified precursor ion (provided in a product-ion mass spectrum of an unknown or user-specified or program estimated in an EI mass spectrum), it returns a unique Hit List containing the spectra for compounds that are similar in structure to that of the unknown, for which there is no spectrum or structure in the searched libraries. From this and with the aid of MS Interpreter, a possible identification can be elucidated for compounds that are not represented in the Library. NIST provides a utility to index any User Library so that it can be used with Hybrid Search. This includes third-party libraries such as the Peter Rosner Designer Drugs EI Library published yearly by John Wiley and the Robert P. Adams Essential Oils Components EI Library.

The Hybrid Search is used not only with the EI Library, but a special version has been developed for use with the NIST 17 Tandem Library. This version uses the precursor ion specified in the header of the submitted spectrum and makes full use of the high mass accuracy from modern instruments.

To demonstrate the potential of Hybrid Search, a urine sample was examined using the NIST 17 Tandem Library and Hybrid Search. This resulted in the identification of 437 metabolites. As an indication of the coverage of the NIST 17 Tandem Library, the search was repeated using both the NIST 17 Tandem Library and the Metlin Library in the MS Search Program’s format. This resulted in the identification of 475 metabolites, which was an increase of 9% when the two libraries were used together. When the search was done using the Hybrid Search, 1524 hits were obtained—a 3.5× increase over the conventional search. The Hybrid Search is just as powerful for the identification of new designer drugs as it is in identifying previously unknown metabolites.

The NIST 17 EI Library has 306,623 spectra for 267,376 compounds and over 404K literature and experimentally determined GC Methods and Retention Indexes for 99,400 compounds, 72,361 of which have EI mass spectra in the library. NIST now records the RI values of all measured spectra, including the GC/MS Method with the spectrum when it is added to the library. Even though quality has always been paramount with the NIST Libraries, more emphasis is being placed on this with the development of new software, measurement of spectra by NIST, and human examination of each and every spectrum.13

It should be noted that the NIST 17 Tandem Library is a general-purpose database with a wide variety of compounds, as seen in Figure 6. This library is designed to provide assistance to a large number of fields of study. Previous versions of this NIST Library have been reviewed. 14,15The Hybrid Search has been developed for product-ion mass spectra as well as EI spectra, and increases the likelihood of identifying unknowns from either type of spectra.

Figure 6 – Types of compounds found in the NIST Tandem Mass Spectral Library.

There are a number of “no-charge downloads” available by selection the NIST 17—the New for Users of NIST 17 Libraries and Software link accessed from the Mass Spectral Library and Others Tools link at http://chemdata.nist.gov/. These include a demo version of the EI Library with a little less than 2.4K of spectra. It is provided with a full-function version of the NIST MS Search Program for Windows v.2.3, the AMDIS Program, and MS Interpreter. There are no GC Method/RI data or a version of the Tandem Library. The Lib2NIST Program, a utility for interconverting NIST Library formats, can be downloaded using a separate link on this Web page. This page also includes a free download of the Glycan and Glycopeptide LC/MS Library and three Annotated Recurrent Unidentified Spectra (ARUS) EI Libraries from GC/MS data (Pediatric Urine, 200+ spectra from pediatric urine samples; Dried Food Material, 650+ spectra from sets of dried food material; and Essential Oil, 1000+ from a large set of essential oils (both commercial and laboratory distilled). These libraries will be continually updated as new spectra are identified.

The NIST 17 distribution includes the NIST MS Search Program for Windows v.2.3, NIST/EPA/NIH EI Mass Spectral Library, NIST Tandem Spectra Library, GC Method RI Library, MS Interpreter, AMDIS Program, and Lib2NIST conversion program. The NIST 17 Tandem Library can be purchased as a standalone program and includes a full function of the NISTMS Search Program for Windows v.2.3, MS Interpreter, and the Lib2NIST Program. The GC Method/Retention Index Library can also be purchased as a standalone with an RI Search Program that has functions not available in the NIST MS Search Program, such as searching RI windows. This GC Method/Retention Index Library is of significance to those using gas chromatographs without mass spectrometers as well as GC/MS users.

The NIST Mass Spectrometry Data Center’s Mass Spectral Library Products are available from most mass spectrometry manufacturers and third-party dealers. A complete list of distributors of NIST Distributors can be found at https://www.nist.gov/srd/nistepanih-mass-spectral-library-distributors. Links to pages with more information about the NIST Tandem Library and the NIST GC Method/Retention Index Library can be found at http://chemdata.nist.gov/dokuwiki/doku.php?id=chemdata:start. This page also has links to a free download of an EPA Tandem Mass Spectrometry Library and a DART Forensics Library, as well as a Glycan MS/MS Library. The same page has links to free downloads of MS Interpreter, the AMDIS Program, a Mass Spectrum Digitizer (a program that will digitize graphic spectra to the NIST MS Search Program’s digital format), and the NIST Glyco Mass Calculator.

References

  1. Brewer, A.K. and Dibeler, V.H. Mass spectrometric analyses of hydrocarbon and gas mixtures. J. Res. Nat. Bureau of Standards  1945, 35, 125–39; http://dx.doi.org/10.6028/jres.035.003.
  2. Heller, S.R. and Milne, G.W.A. EPA/NIH Mass Spectral Database (4 vol.) U.S. Department of Commerce, Washington, DC, 1978; https://doi.org/10.6028/NBS.NSRDS.63v1, https://doi.org/10.6028/NBS.NSRDS.63v2https://doi.org/10.6028/NBS.NSRDS.63v3https://doi.org/10.6028/NBS.NSRDS.63v4.
  3. Heller, R.S.; Milne, G.W.A. et al. An International Mass Spectral Search System (MSSS), V. A status report. J. Chem. Inf. Comput. Sci.  1976, 16(3), 176–8.
  4. Heller, S.R. The history of the NIST/EPA/NIH mass spectral database. Today’s Chemist at Work1999, 8(2), 45–6, 49–50.
  5. Stein, S.E. Estimating probabilities of correct identification from results of mass spectral library searches. J. Am. Soc. Mass Spectrom.  1994, 5(4), 316–23.
  6. Ausloos, P.; Clifton, C. et al. The critical evaluation of a comprehensive mass spectral library. J. Am. Soc. Mass Spectrom. 1999, 10(5), 287–99.
  7. Mallard, W.G.; Andriamaharavo, N.R. et al. Creation of libraries of recurring mass spectra from large data sets assisted by a dual-column workflow. Anal. Chem. 2014, 86(20), 10,231–8.
  8. Stein, S.E. An integrated method for spectrum extraction and compound identification from gas chromatography/mass spectrometry data. J. Am. Soc. Mass Spectrom. 1999, 10, 770–81.
  9. Yang, X.; Neta, P. et al. Extending a tandem mass spectral library to include MS2 spectra of fragment ions produced in-source and MSn spectra. J. Am. Soc. Mass Spectrom.; first online: 18 July 2017;https://link.springer.com/article/10.1007/s13361-017-1748-2, 1–8.
  10. Stein, S.E. Chemical substructure identification by mass spectral library searching. J. Am. Soc. Mass Spectrom. 1995, 6, 644–55.
  11. Heller, S.; McNaught, A. et al. InChI—the worldwide chemical structure identifier standard. J. Cheminformatics 2013, 5(1), 7.
  12. Freeman P. InChI: advancing discovery in chemistry. Springer, The Source, July 31, 2017http://www.springersource.com/inchi-advancing-discovery-chemistry/?wt_mc=SocialMedia.Facebook.1.AUT514.Source%20Post%20&utm_medium=socialmedia&utm_source=facebook&utm_content=7312017&utm_campaign=1_san2600_source%20post%20.
  13. Wallace, W.E.; Ji, W. et al. Mass spectral library quality assurance by inter-library comparison. J. Am. Soc. Mass Spectrom. 2017, 28(4), 733–8.
  14. Kind, T.; Tsugawa, H. et al. Identification of small molecules using accurate mass MS/MS search. Mass Spectrom. Rev. 2017, 24; doi: 10.1002/mas.21535.
  15. Vinaixa, M.; Schymanski, E.L. et al. Mass spectral databases for LC/MS- and GC/MSbased metabolomics: state of the field and future prospects. Trends Anal. Chem. Apr 2016, 78, 23–35.

O. David Sparkman is director of Pacific Mass Spectrometry Facility, University of The Pacific, Department of Chemistry, 3601 Pacific Ave. Stockton, CA 95211, U.S.A.; e-mail: [email protected]www.pacific.edu