Monitoring recombinant protein expression in bacteria by rapid evaporative ionisation mass spectrometry

Rationale There is increasing interest in methods of direct analysis mass spectrometry that bypass complex sample preparation steps. Methods One of the most interesting new ionisation methods is rapid evaporative ionisation mass spectrometry (REIMS) in which samples are vapourised and the combustion products are subsequently ionised and analysed by mass spectrometry (Synapt G2si). The only sample preparation required is the recovery of a cell pellet from a culture that can be analysed immediately. Results We demonstrate that REIMS can be used to monitor the expression of heterologous recombinant proteins in Escherichia coli. Clear segregation was achievable between bacteria harvesting plasmids that were strongly expressed and other cultures in which the plasmid did not result in the expression of large amounts of recombinant product. Conclusions REIMS has considerable potential as a near‐instantaneous monitoring tool for protein production in a biotechnology environment.


| INTRODUCTION
Rapid evaporative ionisation mass spectrometry (REIMS) is an emerging technique for the real-time analysis of biomolecules. 1 REIMS is an ionisation technique based on sudden heating of a sample, predominantly using diathermy, generating a plume of aerosol-containing combustion products. The molecular species in the smoke can be ionised and analysed by mass spectrometry. Most commonly, negative ions derived from fatty acids and phosphoglycerolipids dominate the ensuing mass spectrum. 1,2 REIMS lends itself extremely well to real-time surgical analysis owing to the widespread use of electrosurgical instrumentations where an electrical current is applied to the sample resulting in simultaneous cutting and cauterisation. The by-product is an aerosol and the ions and molecules present in the smoke are indicative of the physiological state and type of tissue. Successful applications include the distinction between healthy and cancerous breast tissue 3 and malignant gynaecological tissue. 4 Outside of surgical environments, REIMS has been used for the authentication of food products such as pistachios, 5 fish, 6 pork 7 and beef. 8 However, the REIMS ionisation process is not limited to tissue samples. REIMS analyses of bacterial and fungal colonies and cultures show promise for the identification of genus and species, [9][10][11][12][13] correctly identifying up to 28 clinically relevant bacterial and fungal species with an accuracy of up to 99% when predicting the gram stain result and at 88% accuracy when identifying specific species. Recent developments include robotic colony identification and sampling to enable automated high-throughput methods for bacterial analysis. 10,14 REIMS generates complex spectra that are not readily deconstructed into the identities of the constituent molecules. Thus, a REIMS data file is analysed as a 'binned' mass spectrum over a broad mass range that is treated as multivariate data. To gain biological meaning from REIMS data, statistical methods such as principal component analysis (PCA), linear discriminant analysis (LDA) and random forests are regularly employed. 15 Each of these approaches assesses the entire binned data file data to discover patterns and similarities among common samples and differences between unrelated samples. Accuracy and precision are determined via test sets and leave-one-out cross-validation.
To date, all REIMS applications have emphasised the identification of a sample, based on prior learning of known samples in a teaching set. In this study, we assessed the potential of REIMS analysis to track the expression and production of recombinant proteins in Escherichia coli. Mass spectrometry has been used extensively to monitor the recombinant protein expression but these approaches have all been based on analysis of the protein itself, whether as an intact protein 16 or after proteolytic fragmentation. 17 We have used REIMS to monitor the expression of a series of QconCATs, artificial 'designer' proteins that are concatenations of tryptic peptides used in the absolute quantification of proteins.
Typically, QconCATs, coded within an expression plasmid, such as pET21a, are produced as 70-90 kDa proteins and are assemblies of tryptic peptides from around 25 target proteins. Because each peptide is released in stoichiometrically identical amounts after tryptic digestion, QconCATs are highly efficient ways to create a pool of approximately 50 peptides for quantification of, typically, 25 proteins. [18][19][20][21][22] We have observed that QconCAT expression is variable, such that some are expressed at extremely high levels but others are not generated in detectable or useable amountsa consequence of the design and expression of these highly atypical artificial proteins. This variable degree of expression, and the fact that the QconCATs have no intrinsic biological activity that could directly impact the host cell metabolism, means that they are ideal candidates to explore the use of REIMS to monitor recombinant protein expression.

| REIMS analysis
An overview of the REIMS workflow is provided in Figure 1 Spectra were recorded in negative ion mode from m/z 50 to 1200 at a scan rate of 1 Hz. Burn events lasted from 10 to 30 sec. Spectra were averaged over the entire burn event.

| Data processing
RAW data was pre-processed using Progenesis Bridge, a feature within the Waters MassLynx Software, with a threshold of 0, to normalise the TIC trace and combine the burn events enabling the comparison of a single spectrum from each pellet. These data were processed in LiveID (Waters) for LDA analysis, using lockmass correction and background subtraction. Data were collected from m/z 50 to 1200 and subsequently binned with an m/z window of 0.01 units. Linear discriminant analysis (LDA) was performed within LiveID. LDA components were calculated using 20 principal components for the data set shown in Figure 4, or 25 principal components for all other data sets. In all cases, the number of LD components was set to one fewer than the number of groups. The pre-processed data were also imported into Progenesis QI where the raw data were aligned, normalised and peak picked. For peak picking a maximum charge state was set at 1. Statistical analysis was performed to extract the most influential features within the data set that contributed to the separation. This information was combined with the processed data from LiveID, exported as a matrix of sample vs m/z bin, and populated with normalised ion intensities. Ion specific data were visualised in R, using RStudio  F I G U R E 2 Analysis of recombinant protein expression by REIMS of total cell pellets. Eleven different bacterial strains, each harbouring a plasmid encoding a different QconCAT, were grown in liquid culture. At a cellular density equivalent to an OD 600 of 0.6, the culture was split into two; in one half expression was induced by addition of IPTG, in the other half growth continued in the uninduced condition to an OD 600 of 1.5 (A). Expression of QconCAT was assessed by SDS-PAGE before and after induction (B). Cell pellets were recovered and analysed by REIMS. The aligned, normalised and binned data were then used to drive LDA, and the different conditions are mapped to the first two LDA components (C) [Color figure can be viewed at wileyonlinelibrary.com] F I G U R E 3 Comparative intensities of informative ions. For the bacterial cultures analysed in Figure 2, the top 10 most informative ions/bins were assessed and displayed as box plots (median, 25 th and 75 th percentiles, maximum and mininmum, with outliers). Symbols and samples are the same as those in Figure 2 [Color figure can be viewed at wileyonlinelibrary. com] heterologous proteins leads to altered metabolic states that can be resolved by REIMS.
To establish the validity of the data-processing approach, we repeated the analyses after randomising the sample assignment and re-running the processing methods. After randomisation, the different cultures gave little evidence of segregation (supporting information).
The inability of LDA to create clusters when the input data had been randomised lends confidence to the clustering of the correctly allocated data. It is therefore likely that REIMS can discriminate stage of growth, induction and expression. Although the expressed proteins were non-physiological and distinct for each culture, the co-clustering of cultures with similar expression levels was also encouraging.
Importing the data into Progenesis QI through Progenesis Bridge allowed us to interrogate the data at a feature level ( Figure 3  provide new insights into the molecular responses to the added demand of over-expression of a single protein, but this was beyond F I G U R E 5 Use of REIMS to track the time course of recombinant protein expression. Six bacterial cultures, three of which contained strongly expressing QconCAT encoding plasmids and three of which contained QconCAT plasmids that failed to express were grown to a turbidity of OD 600 of 0.6, at which point induction was initiated (A). Samples were removed from each culture at hourly intervals for analysis by REIMS. The paths traced in the three-component LDA plots differed for expressors (B, left, red to purple) and non-expressors (B, right, green to blue); note that all data are present in each plot, but that high and low expressors are differentially highlighted in each sub-panel. The intensities of five informative m/z bins that change during induction are plotted in C [Color figure can be viewed at wileyonlinelibrary.com] the scope of the present study, and would be best explored in a formal metabolomics context.

| CONCLUSIONS
REIMS analysis of cells recovered from liquid bacterial cultures generates informative negative ion mass spectra. The spectra can be used to monitor bacterial growth but, more specifically, can track and monitor the induced expression of recombinant, heterologously expressed proteins. The speed of the analysis suggests that REIMS has value for routine monitoring in biotechnological applications.