Volume 18, Issue 18 p. 2162-2168
Research Article

MS1, MS2, and SQT—three unified, compact, and easily parsed file formats for the storage of shotgun proteomic spectra and identifications

W. Hayes McDonald

W. Hayes McDonald

Department of Cell Biology, The Scripps Research Institute, La Jolla, CA, USA

Search for more papers by this author
David L. Tabb

David L. Tabb

Department of Cell Biology, The Scripps Research Institute, La Jolla, CA, USA

Department of Genome Sciences, The University of Washington, Seattle, WA, USA

Search for more papers by this author
Rovshan G. Sadygov

Rovshan G. Sadygov

Department of Cell Biology, The Scripps Research Institute, La Jolla, CA, USA

Search for more papers by this author
Michael J. MacCoss

Michael J. MacCoss

Department of Cell Biology, The Scripps Research Institute, La Jolla, CA, USA

Search for more papers by this author
John Venable

John Venable

Department of Cell Biology, The Scripps Research Institute, La Jolla, CA, USA

Search for more papers by this author
Johannes Graumann

Johannes Graumann

Department of Biology, California Institute of Technology, Pasadena, CA, USA

Search for more papers by this author
Jeff R. Johnson

Jeff R. Johnson

Department of Cell Biology, The Scripps Research Institute, La Jolla, CA, USA

Search for more papers by this author
Daniel Cociorva

Daniel Cociorva

Department of Cell Biology, The Scripps Research Institute, La Jolla, CA, USA

Search for more papers by this author
John R. Yates III

Corresponding Author

John R. Yates III

Department of Cell Biology, The Scripps Research Institute, La Jolla, CA, USA

Department of Cell Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA, USA.Search for more papers by this author
First published: 13 August 2004
Citations: 297

Abstract

As the speed with which proteomic labs generate data increases along with the scale of projects they are undertaking, the resulting data storage and data processing problems will continue to challenge computational resources. This is especially true for shotgun proteomic techniques that can generate tens of thousands of spectra per instrument each day. One design factor leading to many of these problems is caused by storing spectra and the database identifications for a given spectrum as individual files. While these problems can be addressed by storing all of the spectra and search results in large relational databases, the infrastructure to implement such a strategy can be beyond the means of academic labs. We report here a series of unified text file formats for storing spectral data (MS1 and MS2) and search results (SQT) that are compact, easily parsed by both machine and humans, and yet flexible enough to be coupled with new algorithms and data-mining strategies. Copyright © 2004 John Wiley & Sons, Ltd.