Volume 43, Issue 5 p. 1066-1090
REVIEW ARTICLE
Open Access

Characterization of mRNA therapeutics

Guilherme J. Guimaraes

Guilherme J. Guimaraes

Department of Pharmaceutical and Biomedical Sciences, College of Pharmacy, University of Georgia, Athens, Georgia, USA

Search for more papers by this author
Jaeah Kim

Jaeah Kim

Department of Pharmaceutical and Biomedical Sciences, College of Pharmacy, University of Georgia, Athens, Georgia, USA

Search for more papers by this author
Michael G. Bartlett

Corresponding Author

Michael G. Bartlett

Department of Pharmaceutical and Biomedical Sciences, College of Pharmacy, University of Georgia, Athens, Georgia, USA

Correspondence Michael G. Bartlett, Department of Pharmaceutical and Biomedical Sciences, College of Pharmacy, University of Georgia, Athens, GA 30602, USA.

Email: [email protected]

Search for more papers by this author
First published: 04 July 2023
Citations: 9

Guilherme J. Guimaraes and Jaeah Kim contributed equally to this study.

Abstract

Therapeutic messenger RNAs (mRNAs) have emerged as powerful tools in the treatment of complex diseases, especially for conditions that lack efficacious treatment. The successful application of this modality can be attributed to its ability to encode entire proteins. While the large nature of these molecules has supported their success as therapeutics, its extended size creates several analytical challenges. To further support therapeutic mRNA development and its deployment in clinical trials, appropriate methods to support their characterization must be developed. In this review, we describe current analytical methods that have been used in the characterization of RNA quality, identity, and integrity. Advantages and limitations from several analytical techniques ranging from gel electrophoresis to liquid chromatography–mass spectrometry and from shotgun sequencing to intact mass measurements are discussed. We comprehensively describe the application of analytical methods in the measurements of capping efficiency, poly A tail analysis, as well as their applicability in stability studies.

1 INTRODUCTION

1.1 The structure and function of messenger RNA (mRNA)

Oligonucleotides have been finding increasing use as therapeutics. Their use generally began by exploiting ways to interfere with mRNA transcription using small highly modified strands of RNA to achieve therapeutic goals. However, it has been their use as mRNA-based vaccines that has foreshadowed a paradigm shift in the treatment of many diseases.

There are significant differences in the challenges involved in the characterization of mRNA-based vaccines relative to smaller oligonucleotide therapeutics. It is important to realize that just the open reading frame for an mRNA is roughly nine times the molecular weight of the protein that it encodes for since each amino acid (roughly 100 Daltons) corresponds to a trinucleotide codon (approximately 900 Daltons). In addition to the challenges of size, mRNA therapeutics are polar, highly charged, contain secondary structure, and have many specialized regions contained within their sequence. As shown in Figure 1, an mRNA therapeutic has five major regions: (1) the 5′-cap; (2) the 5′-untranslated region (UTR); (3) the open reading frame; (4) a 3′-UTR and; (5) a 3′-poly A tail.

Details are in the caption following the image
General structure and function of the major domains of an in vitro transcribed messenger RNA (IVT-mRNA) therapeutic. [Color figure can be viewed at wileyonlinelibrary.com]

In mRNA, the 5′-cap has several important biological roles that are essential for the creation of protein. In eukaryotes, the 5′-cap terminates in a 7-methyl guanosine (m7G) that is connected to the remaining RNA strand through a 5′ to 5′ triphosphate linkage (m7GpppN). One immediate feature of this architecture is that it promotes nuclease resistance by presenting an atypical structure, which dramatically increases the residence time of mRNAs (Ramanathan et al., 2016). The 5′-cap also plays a role in promoting the assembly of the ribosome and facilitating gene expression (Ramanathan et al., 2016). There have been several modifications to the 5′-cap that have been shown to improve translational efficiency (Kwon et al., 2018). These include methylating either the first or the first and second nucleotide following the triphosphate linkage resulting in either an m7GpppNmN or m7GpppNmNm structure.

The 5′-UTR plays a vital role in forming a complex between the mRNA and the ribosome. The 5′-UTR is generally 100-200 base pairs in length concluding with a short nucleotide sequence known as the Kozak sequence directly before the start codon and the open reading frame. In 1978, Marilyn Kozak identified this feature of mRNA (Kozak & Shatkin, 1978). The optimized Kozak sequence is GCCRCCAUGG where R = either A or G and AUG is the start codon. The base pairs before the Kozak sequence are the main point of contact between the ribosome and the mRNA during the initiation phase of protein translation (Kozak, 1987).

The open reading frame contains the nucleotide sequence that corresponds to the primary sequence of the protein produced by the mRNA. There are a few important features of the open reading frame that contribute to its function. The frequency of codon usage and RNA secondary structure within this region both contribute to the rate and regulation of protein expression (Xia, 2021). Both mRNA stability and protein expression are adversely impacted by the presence of UU or UA sequences within the open reading frame. This is believed to be due to the increased susceptibility of these sequences to endonucleases (Al-Saif & Khabar, 2012). Additionally, replacement of U with 1mΨ, Ψ, or methoxyΨ all appear to stabilize secondary structure and 1mΨ also has been shown to extend the half-life of mRNAs (Boros et al., 2013; Karikó et al., 2008).

The 3′-UTR is about 200–800 bases in length and serves a different function from the 5′-UTR. The 3′-UTR also has an optimal length with shorter ones having poorer translation and longer ones having reduced half-lives. For therapeutic uses, the 3′-UTR from a human globin protein is often used since these pyrimidine-rich sequences have been found to stabilize the poly A tail and confer greater stability to the entire mRNA (Jiang et al., 2006; Kwon et al., 2018).

The poly A tail is typically 60–150 base pairs in length. It has several roles including increasing mRNA stability and translation (Goldstrohm & Wickens, 2008; Jeeva et al., 2021). It also is the binding site for the poly A binding protein, which is critical for bringing the mRNA into proper alignment within the ribosome (Weissman, 2015).

These five elements all play crucial roles in the proper function of mRNA therapeutics and therefore, it is essential to be able to reproducibly manufacture these complex molecules. However, it is currently not possible for a single technique to determine all of the attributes of these molecules. Therefore, it is necessary to employ a wide range of techniques with each having advantages in looking at specific elements of the therapeutic mRNA, to completely characterize these molecules.

1.2 Current mRNA therapeutics

There are only two currently approved mRNA-based therapeutics. These are the Sars-CoV-2 vaccines BNT 162b2 (BioNTech/Pfizer) and mRNA 1273 (Moderna) (Jeeva et al., 2021; Teo, 2022). However, there are already several more in human clinical trials against other diseases such as cardiovascular, cancer, rabies, and cystic fibrosis (Qin et al., 2022; Zogg et al., 2022).

For these two mRNA vaccines, it is important to understand the decisions that were made in the design of the five major structural domains and how these may lead to downstream challenges in their characterization. BNT 162b2 uses a 5′-capping strategy developed by TriLink Biotechnologies (CleanCap Kit) (Henderson et al., 2021). This approach co-transcribes a cap composed of m7GpppN2′OMe on the end of the mRNA. While mRNA 1273 uses a posttranscriptional enzymatic reaction system based on vaccinia capping (Rosa et al., 2021; Yisraeli & Melton, 1989). The approach improves the initiation of translation by improving the recruiting of translation initiation factors and improving nuclease resistance. A challenge of using this posttranscriptional enzymatic capping process is that it is only 88%–98% efficient (Beverly et al., 2016). Uncapped species constitute the balance of this enzymatic reaction. There is also a chance that the pm7Gp cap will be installed in the reverse orientation, although to date this has not been detected (Beverly et al., 2016).

Both vaccines use the same Kozak sequence to initiate protein translation (GCCACCAUG). This sequence differs in the last nucleotide from the consensus mammalian sequence. This change is made because the spike protein does not undergo removal of the N-terminal methionine that is seen in the majority of proteins. Proteins that undergo this posttranslational modification generally also have an alanine or glycine as the second amino acid to promote the removal of the methionine (Nesterchuk et al., 2011). However, the spike protein has a phenylalanine at this position, which makes it impossible to have a G directly following the AUG start codon (Nesterchuk et al., 2011; Xia, 2021).

While both vaccines have identical Kozak sequences, the rest of the 5′-UTR's are quite different. BNT 162b2 uses 35 nucleotides from the 5′-UTR of the highly translated α-globin, while mRNA 1273 uses a GC-rich region derived from the oncogene V1-UTR (Xia, 2021). The UTR in the BNT 162b2 vaccine has almost no secondary structure, which is believed to be ideal for facilitating rapid translation of the open reading frame (Xia, 2021). Interestingly, the mRNA 1273 5′-UTR has some secondary structure (Xia, 2021). It has been speculated that this may slow the production of the spike protein and be related to the higher dose of this vaccine needed to achieve similar efficacy (Xia, 2021).

The open reading frames of the two vaccines encode for the same protein. Since in many cases, there are multiple codons, which encode for the same amino acid, the two vaccines have some differences in their choices. A few examples of these differences are the amino acids Arg (mRNA 1273 predominantly uses AGA while BNT 162b2 uses CGG about 2/3 of the time and AGA about 1/3 of the time) and Aspartate (BNT 162b2 favors GAG over GAA while mRNA 1273 uses GAA exclusively) (Xia, 2021). Choices in codons impact the translational efficiency of the mRNA (Hanson & Coller, 2018). In the end, both vaccines end up with mRNA translation efficiencies that are about 50% better than the wild type and are roughly the same (Kim et al., 2022). The most significant difference between the two vaccines is in their choices at the end of the open reading frame. There are three naturally occurring stop codons. BNT 162b2 places two consecutive UGA stop codons at the end of the open reading frame, while mRNA 1273 uses a sequence containing all three stop codons (UGAUAAUAG) (Xia, 2021). It is not clear that either of these has an advantage over the other but there have been studies that show that UAA is the most efficient stop codon for prokaryotes (Belinky et al., 2018).

The 3′-UTR regions between the two vaccines are quite different. While BTN 162b2 used a human globin for its 5′-UTR, mRNA 1273 uses 110 nucleotides from α-globin for its 3′-UTR (Xia, 2021). However, BTN 162b2 uses a significantly different approach by beginning with two separate highly effective sequences and continuing to optimize them using systematic evolution of ligands by exponential enrichment (SELEX) to make them more effective. The BTN 162b2 vaccine first places 136 nucleotides derived from the human ASE/TLE5 gene followed by 139 nucleotides from the human mitochondrial 12 S rRNA (Xia, 2021). Both vaccines have 3′-UTR's on the shorter end of the size range, which means their manufacturers have chosen to prioritize half-life over translational efficiency.

The poly A tail for BTN 162b2 uses an approach involving segmented poly A sequences. In this case, A30 and A70 are linked by a short spacer of GCATATGACT (Xia, 2021). This approach is considered to be more reproducible and homogeneous than using just allowing poly A polymerase to create the tail (Trepotec et al., 2019). There is currently no information available on the exact nature of the poly A tail from mRNA 1273. Researchers at Stanford sequenced this vaccine but were only able to confirm the first 8 adenosines from the poly A tail (Jeong et al., 2021).

2 SPECIFICATIONS AND TECHNIQUES FOR THE CHARACTERIZATION OF MRNA

2.1 RNA integrity

The production of reproducible high-quality RNA is critical to the overall success of any therapeutic mRNA. Determinations of RNA integrity have been performed using many different approaches; however, only over the past 16 years have more quantitative metrics been applied. The RNA Integrity Number (RIN) was first proposed in 2006 to have a uniform standard for reporting the quality of RNA (Schroeder et al., 2006). The initial need was to ensure that only RNA of sufficient quality was advanced for sequencing. The traditional method for calculating RNA integrity numbers involved the ratio of the gel electrophoresis bands from 28S and 18S rRNA (Imbeaud et al., 2005). RIN values range between 10 and 1 with 10 being the highest. The values are calculated from the ratio of the area of the main RNA peak relative to the total area of the electropherogram.

Another metric that can be used to measure RNA integrity is the RNA Quality Score (RQS). The RQS is calculated using peak heights, peak areas, and concentrations to generate values between 1 and 10 (Lifescience, 2009). RQS values are highly correlated to RIN values (Lifescience, 2009; Technologies, 2020).

2.1.1 Agarose gel electrophoresis

One of the most common techniques for the analysis of RNA is gel electrophoresis. Due to the highly charged phosphate backbone, RNA will readily migrate toward an anode under the influence of an electric field. Agarose gels have pores that act to impede the migration of RNA based on its size and shape (Rio et al., 2010). To simplify the results from this experiment, denaturing gels are used to ensure that separations are only dependent on the size of the RNA. Agarose gels are generally used for oligonucleotides containing 600 nucleotides or more, which makes it ideal for mRNA therapeutic analysis.

RNA levels are quantitated using Northern Blot analysis from the agarose gel following separation. This uses a fluorescently-labeled (or radiolabeled) complementary DNA (usually 25–45 nucleotides). Following hybridization, the target RNA is detected. One complication to this approach for determining integrity is that the hybridization probe may not detect all degradation products, especially if it is no longer fully complementary. There are suggestions that this may begin to be problematic with as few as three nucleotides missing (Kim et al., 2019).

2.1.2 Ultraviolet (UV) spectroscopy

UV spectroscopy is widely used to determine concentrations of oligonucleotides using absorbance at 260 nm. The purity of an oligonucleotide can be quickly determined by calculating the ratio of the absorbance at 260 nm by the absorbance at 280 nm. High-quality oligonucleotides will have a ratio between 1.8 and 2.1. These measurements should be made using solutions that provide absorbance readings between 0.1 and 1.0 to maximize the accuracy of the 260/280 ratio measurement. It is also important to use consistent pH and temperature since these will alter UV absorbance (Wilfinger et al., 1997). This approach will result in higher-quality data, which can be used to trend manufacturing batches over time.

While this approach is simple and fast, it does have limitations that must be managed. First, this method does not discriminate between DNA and RNA. Therefore, it is necessary to treat samples with DNase before analysis to guarantee that only RNA is being measured and no residual complementary DNA (cDNA) from the manufacturing process remains. Since UV absorbance is sensitive to the materials used in the cuvettes, temperature, and solution pH, it is imperative to actively manage these parameters to prevent causing unnecessary investigations of products.

2.1.3 Capillary gel electrophoresis (CGE)

CGE is an excellent technique for providing high separation efficiency for oligonucleotides greater than 2000 nucleotides in length (Lu et al., 2020; Skeidsvoll & Ueland, 1996; Sumitomo et al., 2009). However, the separation times are long, which currently limits this approach for widespread support of mRNA therapeutic development. This challenge was recently addressed by Rustandi and co-workers with their development of a microchip capillary electrophoresis apparatus for the determination of purity and integrity of a 2000-mer mRNA (Raffaele et al., 2022). They also showed that the method could be applied to mass ladders from 200 to 6000 nucleotides in length, with the separation being completed in approximately one minute (Raffaele et al., 2022).

In CGE, RNAs are run under denaturing conditions, which eliminates secondary structure in the oligonucleotides. This allows the separation to occur based completely on size. Operating under denatured conditions also provides sharper peaks, higher resolution, and provides greater ability to assess the integrity of the mRNA (Lu et al., 2020). This approach is used with generating the RIN and RQN values discussed above.

Two recent advances that have led to the ability to look at large oligonucleotides like mRNAs at high-resolution are (1) using more dilute polymers to form gels in the capillaries and (2) using nonaqueous background electrolytes like formamide (De Scheerder et al., 2018; Han et al., 1999; Lu et al., 2020; Rocheleau et al., 1992; Todorov et al., 2001). Formamide has many interesting properties that make it useful for CGE of oligonucleotides such as (1) being an outstanding solvent for many ideal CGE polymers; (2) it strongly denatures oligonucleotides; and (3) is an excellent solvent for highly charged molecules.

2.1.4 Ion-pair reversed-phase liquid chromatography (IP-RP)

IP-RP has been widely applied for the determination of oligonucleotides (Basiri & Bartlett, 2014; Sutton et al., 2021). However, most of these applications have been reserved for smaller oligonucleotides or enzymatic digests from larger oligonucleotides. Moderna has discussed some of their uses of IP-RP of mRNAs in their patent for analysis of mRNA heterogeneity and stability (Spivak et al., 2016). The separation uses either Waters XBridge or Phenomenex Clarity columns with a mobile phase consisting of TEA acetate (pH 7.0) using acetonitrile to elute the oligonucleotide. Ultraviolet detection at 260 nm is used with increases in peak width and greater tailing being used to indicate increases in mRNA heterogeneity.

More recently, researchers from Astra Zeneca proposed a similar method using the same TEA acetate/acetonitrile mobile phase only at pH = 8.5 (Currie et al., 2021). However, they used a ThermoFisher DNAPac polymeric column with a large pore size for their separation. Ultraviolet detection at 260 nm was used. To determine degradation, they monitored decreases in the peak area of the main mRNA peak as well as the increasing percentage of peak area before elution of the main peak. One notable feature of this approach was the significantly shorter run time (15 min vs. 60 min) and dramatically narrower peak widths. This method was inspired by earlier work by Kanavarioti who demonstrated the separation of various length oligonucleotides including an mRNA using the DNAPac column (Kanavarioti, 2019).

There have been a few other approaches to the chromatographic analysis of large RNAs. Dickman and coworkers used superficially porous particles and Isobe and coworkers used polystyrene-divinylbenzene polymeric columns. Each approach demonstrated the ability to successfully separate large RNAs containing up to 6000–8000 bases in length (Close et al., 2016; Yamauchi et al., 2013). It would be interesting to see how successfully these stationary phases could be applied to the determination of mRNA integrity.

2.2 Identity

As a critical aspect of the quality system, mRNA identity including the determination of the mRNA sequence should be confirmed during drug development and subsequent regulatory filing.

2.2.1 RT-PCR followed by Sanger sequencing

Based on the specification section of the COVID-19 vaccine assessment report by Moderna and Pfizer/BioNTech, they announced that RT-Sanger sequencing has been used for the confirmation of the identity of the encoded sequence (2021a2021b). The Sanger sequencing method, developed in 1977 (Sanger et al., 1977), is a useful conventional technology for DNA sequencing with reliability, cost-effectiveness, and rapid turnaround time when high-throughput is not needed (Jiang et al., 2019; Slatko et al., 2018). Before Sanger sequencing, reverse-transcriptase polymerase chain reactions (RT-PCR) were used to synthesize a cDNA using the mRNA as a starting template. This synthesized cDNA is used as a template along with a complementary DNA primer for DNA synthesis. In four polymerase solutions, four types of deoxynucleotide triphosphates (dNTPs: A, T, G, and C) are mixed with only one type of dideoxynucleotide triphosphate (ddNTP) for each solution. Each ddNTP, labeled with a distinct fluorescent dye, is a specific chain-terminating nucleotide not having a 3′-OH group, resulting in the termination of DNA synthesis at that point by not forming phosphodiester bonds (Slatko et al., 2018). This allows for a readout of the sequence following a length-based separation by looking at the terminating fluorophore.

2.2.2 Next-generation sequencing (NGS)

NGS is a newer technology needed for high-throughput sequencing of large genomes in the biotechnology field. In general, the workflow for sample preparation begins with the fragmentation of RNAs, followed by synthesis of short cDNA fragments using a random hexamer-primer, end reparation, adenylation, ligation of sequencing adapter, and library amplification (He et al., 2013; Van Dijk et al., 2014). As a sequencing platform for NGS, commercial assays performed by several companies including Illumina, ThermoFisher, ArcherDX, and Qiagen, using different chemistries for detection are available in the market (Qu et al., 2020). Although NGS has numerous advantages with high sensitivity and accuracy, fast turnaround time, and cost-effectiveness for RNA sequencing, RT-PCR-based methods such as NGS and Sanger Sequencing have some limitations in the presence of base modifications in IVT mRNAs due to the process of converting the RNA to cDNA (Morreel et al., 2022). In actual cases, mRNA vaccines made by Moderna and Pfizer/BioNTech have chemically modified N1-methylpseudouridine to prevent the immune response resulting from the introduced mRNA (Vanhinsbergh et al., 2022). Generally, modified nucleotides such as 5-methoxyuridine, 5-methylcytidine, and N6-methyladenosine have been used for the optimization of mRNA structure and reduction of immunogenicity (Vanhinsbergh et al., 2022).

2.2.3 IP-RP LC-MS: T1 digestion for fingerprint and shotgun sequencing

The first major advance in mRNA sequence mapping using LC-MS was the method introduced by Jiang and coworkers at Moderna (Jiang et al., 2019). As shown in Figure 2A, this method applied multiple orthogonal endonucleases in parallel to digest and provide complementary sequence coverage for large mRNAs (human erythropoietin [745 nt], firefly luciferase [1816 nt], and α-catenin [2884 nt]). Since RNase T1, a high-frequency cleaver at the 3′-end of G, can generate isomeric or identical oligonucleotides, they also used colicine E5 and mazF, which have distinct RNA digestion specificities (Figure 2B), and which help with near-total sequence coverage in oligonucleotide mapping. After parallel digestions by multiple endonucleases with each digestion optimized for digestion specificity and efficiency, samples were analyzed using a C18 column with a diisopropylethylamine/hexafluoroisopropanol mobile phase. LC-MS data was analyzed by an in-house C++ program generating output including sequences, masses, retention times, and abundances. As a result, greater than 70% sequence coverage of mRNAs (~ 3000 nucleotides) was achieved using combined usage of multiple endonucleases (Figure 2C). Additionally, they applied this method to detect low-level impurities, such as single nucleotide polymorphisms (SNPs) with a sensitivity (<1%). Compared to Sanger sequencing and NGS, LC-MS provides direct analysis without the conversion of mRNA to cDNA, resulting in improved fidelity and speed of analysis (Jiang et al., 2019).

Details are in the caption following the image
(A) A novel bottom-up oligonucleotide sequence mapping workflow combining multiple endonucleases. (B) LC-MS total ion chromatograms of Epo mRNA digested by RNase T1 (top, red), colicin E5 (middle, black), and mazF (bottom, blue). Note that colicin E5 and mazF tend to produce larger, later eluting oligonucleotide. (C) Sequence coverage maps obtained from individual digestions of Epo, Luc, and α $\alpha $ -catenin mRNA by RNase T1, colicin E5, and mazF. Reprinted with permission from Jiang et al. (2019), Analytical Chemistry. LC-MS, liquid chromatography–mass spectrometry; mRNA, messenger RNA. [Color figure can be viewed at wileyonlinelibrary.com]

A similar approach was conducted by Vanhinsbergh et al. (2022) who performed sequence mapping of several mRNAs including a modified version of SARS CoV-2 spike protein (~3900 nt) and Fluc mRNA (1929 nt) using partial RNase T1 digestion followed by LC-MS analysis. They optimized the amount of immobilized RNase T1 on magnetic particles and reaction time for high sequence coverage. The separation was achieved using a poly(styrene-divinylbenzene) column with a triethylamine/hexafluoroisopropanol mobile phase. As a result, a >80% sequence coverage of large mRNAs was acquired using the approach. Additionally, they demonstrated the ability of this workflow for the high sequence coverage (>90%) of both chemically modified with 5-methoxyuridine and unmodified CleanCap Fluc mRNA. They also showed that low-level impurities of a known rRNA sequence was determined in the mixture of mRNA and rRNA at a ratio of 10:1 using LC-MS/MS (Vanhinsbergh et al., 2022).

Recently, Correa and coworkers demonstrated the use of RNase 4 as an alternative to T1 digestion for the characterization of therapeutic mRNAs (Wolf et al., 2022). RNase 4 cleaves after uridine residues before a purine (e.g., UA or UG). This selectivity results in larger digestion products than typically obtained from T1 digests. They also found that RNase 4 was able to cleave after modified uridine nucleotides such as 1-methyl pseudouridine and 5-methyoxyuridine. However, it was not able to cleave 2′-OMe modified uridines. The authors showed several examples, including BNT 162b2 where LC-MS/MS sequencing following RNase 4 digestion showed greater coverage than using RNase T1.

2.2.4 Nanopore direct RNA sequencing (DRS)

Even though the commercialization of the technology remains relatively new, recent advances in nanopore DRS allow direct sequence of full-length native RNA molecules without the need for reverse transcription or amplification (Brown & Clarke, 2016; Kono & Arakawa, 2019; Leger et al., 2021). The ability to directly sequence RNA without amplification is important as it preserves RNA modifications in the input material (Garalde et al., 2018; Smith et al., 2017; Viehweger et al., 2019). These advances have turned the technique into a powerful characterization tool for RNA analysis including cellular mRNA, noncoding RNA, and RNA viruses (Jain et al., 2022; Kim et al., 2020; Viehweger et al., 2019; Wongsurawat et al., 2019). Several studies have applied the technique in the detection of common RNA modifications such as m6A, inosine, and pseudouridine (Bailey et al., 2022; Fleming & Mathewson, 2021; Gao et al., 2021; Huang et al., 2021; Nguyen et al., 2022; Parker et al., 2020; Tavakoli et al., 2022). While DRS continues to show promising results in RNA sequencing, a thorough review by Jain and coworkers highlight the need of several technical improvements that would broaden the use of nanopore DRS: (1) higher basecall accuracy; (2) decreased RNA input; (3) routine validation of RNA modification calls; (4) full-length reads; (5) validation of newly discovered mRNA isoforms; (6) straight forward implementation of software developed by the academic community (Jain et al., 2022).

2.3 Purity and impurities

The in vitro, cell-free transcription of mRNA therapeutics prevents the presence of cell-derived impurities. However, other process-related impurities such as enzymes, residual nucleotide triphosphates (NTPs), DNA templates, or mRNA fragments may be present as a result of the manufacturing process (Rosa et al., 2021). Additionally, during storage and due to the use of lipid nanoparticles (LNPs), smaller RNA fragments and other impurities may be formed as a result of oxidation, hydrolysis, transesterification, or formation of lipid-mRNA adducts (Packer et al., 2021; Pogocki & Schöneich, 2000). Proper characterization of mRNA process-related impurities is necessary to not only minimize batch-to-batch variability by optimizing production, but also to achieve better product quality as it has been shown that process-related impurities may significantly decrease protein translation (Karikó et al., 2011; Rosa et al., 2021). Several analytical techniques including chromatography, qPCR, and immunoblot can be used to assess mRNA purity and potentially characterize process-related impurities. Each technique presents its strengths and weaknesses. The abundance of published analytical methods for separation of RNAs longer than 100 bases is much lower than methods analyzing shorter RNA strands. The difference is likely due to the complexity of separations as a function of RNA size as well as the novelty of the field since mRNA therapeutics are relatively new (Demelenne et al., 2021). A variety of analytical methods ranging from intact mRNA analysis, to digested fragments and even mRNA raw material analysis have been recently reported (Jiang et al., 2019; Kitamura et al., 2022; Vanhinsbergh et al., 2022).

2.3.1 IP-RP chromatography

The polyanionic nature of mRNAs make traditional reversed-phase chromatographic separations challenging, since there is not enough interaction between analyte and stationary phase. A common approach to improve retention of oligonucleotides is to add ion-pairing agents to the mobile phase. Within IP-RP chromatography, the choice of the counter ion is determined by the detection method. Nonvolatile ions such as acetate or highly concentrated salt solutions make MS detection challenging, thus, UV detection is used. As an alternative, implementation of alkylamines as ion-pairing agents and fluoroalcohols as counter-ions facilitate MS integration. IP-RP chromatography provides desirable selectivity for RNA analysis in terms of size, but most importantly in terms of sequence specificity, regardless of denaturing conditions, including high column temperatures, presence of organic solvents, and mobile phase additives (Azarani & Hecker, 2001; Georgopoulos & Leibowitz, 2000; Huber et al., 1995). Azarani and Hecker report optimal resolution in the chromatography of an RNA ladder ranging from 155 to 1770 nt to be a consequence of high temperature (Azarani & Hecker, 2001). Resolution improves substantially from 45°C to 75°C, likely due to decreased RNA intramolecular and intermolecular interactions. The authors report their separations using a 0.1 M triethylammonium acetate (TEAA) buffer. Currie and co-workers also report optimal TEAA concentrations for the analysis of an RNA ladder (ranging from 100 to 1000 nt) to be 0.1 M (Currie et al., 2021). The study further evaluates more hydrophobic ion-pairing agents including hexylamine acetate (HAA) and dibutylamine acetate (DBAA). With TEAA, higher resolution was achieved for fragments ranging between 750 and 1000 nucleotides, while shorter fragments (100–200 nt) were better resolved with HAA and DBAA. The selectivity of the chromatographic method under different ion-pairing agents can be seen in the analysis of a single-stranded RNA ladder ranging from 100 to 1000 nucleotides (Figure 3).

Details are in the caption following the image
Comparison of different ion-pair agents for the separation of a single-stranded RNA ladder 100–1000 nucleotides using the polymer-based DNAPac RP column. (A) 25 mM DBA (34.6%−46.6% MeCN) (B) 15 mM HA (32.3%−40.3% MeCN) (C) 100 mM TEA (8.6%−12.6% MeCN). Reprinted with permission (Currie et al., 2021). [Color figure can be viewed at wileyonlinelibrary.com]

In comparison to TEAA, DBAA, and HAA were used at much lower concentrations without substantial retention loss. To maintain a similar retention time window, 25 mM DBAA and 15 mM HAA required a starting concentration of 34.6% and 32.3% acetonitrile respectively. TEAA required much lower acetonitrile starting concentration at 8.6% (Currie et al., 2021). Donegan and coworkers show similar observations for smaller oligonucleotides—that is, more hydrophobic ion-pairing agents result in increased retention times (Donegan et al., 2022). Given the length of the analytes, and high acetate concentration, UV was chosen as the detection method.

Current IP-RP methods are not able to resolve traditional impurities such as N-1 species from RNAs larger than ~90 bases (Strezsak et al., 2022). In these cases, high-resolution mass spectrometry can be used to differentiate charged species, although ionization of RNAs over 100 bases long becomes more difficult. Additionally, due to the high number of charge states, MS analysis of longer RNAs may require quite high MS resolution. Streszak and coworkers report a new mobile phase combining TEA/HFIP and TEAA while incorporating ethanol to reduce charge states and applied this mobile phase to mRNA characterization assays (Strezsak et al., 2022). The authors report the use of TEAA to reduce the charge-state distribution, but due to poor integration of acetate and mass spectrometry of oligonucleotides, ion intensity was 50% lower when compared to traditional HFIP/TEA mixtures. The authors mitigated the ion suppression caused by TEAA by using ethanol over traditional organic modifiers such as acetonitrile or methanol. Advantages from using ethanol in oligonucleotide LC-MS mobile phases has been previously reported by Chen and co-workers (Chen & Bartlett, 2013).

Using ion-pairing agents in an innovative way, a recent method developed by Fekete and coworkers uses the ion-pairing agent tetramethylammonium as an additive to reduce the number of accessible charges in an mRNA during a NaCl salt gradient. In the reported method, sharper peaks were observed when tetramethylammonium chloride was incorporated into a NaCl gradient and applied to an ion-pairing anion exchange (IPAX) method (Fekete, Yang, et al., 2022).

2.3.2 Size-exclusion chromatography

The final mRNA product should be substantially different in size when compared to small or medium molecular weight impurities. The large difference between the analytes makes size-exclusion chromatography an appealing alternative for separations. Lukavsky and Puglisi report a >99% pure RNA product, purified from its transcription mixture by nondenaturing SEC (Lukavsky & Puglisi, 2004). Wang and Chen report SEC separations of two mRNAs through UV detection, while online light scattering is used to detect biophysical properties of the analytes, including the radius of gyration, and hydrodynamic radius (Wang & Chen, 2022). SEC coupled with light scattering may provide insightful information about RNA conformation. The SEC-light scattering configuration has also been applied in the study of long-noncoding RNAs (D'Souza et al., 2022). Hybridized nucleic acids may be formed during the manufacturing process and include double-stranded RNA, and RNA:DNA hybrids (Spivak et al., 2016). The use of denaturing conditions or addition of chaotropic agents to the SEC mobile phase can facilitate the dissociation of hybridized impurities and allow better separations between the RNA and hybridized nucleic acid impurities (Spivak et al., 2016). As an example, Spivak and coworkers demonstrate SEC at denatured conditions by raising column temperature from 25°C to 75°C. Furthermore, the study adds 1 mM Ethylenediaminetetraacetic acid (EDTA) to the mobile phase to minimize divalent cation-induced self-association of RNAs that can lead to peak broadening and short column lifetime (clogging) (Spivak et al., 2016).

2.3.3 Capillary gel electrophoresis

In addition to the chromatographic methods previously discussed, capillary gel electrophoresis provides several advantages in the analysis of larger nucleic acids including low sample consumption, low waste generation, and faster analysis. Furthermore, capillary electrophoresis is well suited for the analysis of charged analytes (Ewing et al., 1989; Lu et al., 2020). Similar to other chromatographic methods, mRNA secondary structure contributes to peak broadening and poor resolution. Therefore, sharp peak shapes and accurate size determination are obtained when larger RNAs are run under strongly denaturing conditions (Lu et al., 2020). Traditional denaturing additives include urea, acetic acid, formamide, and 1,2,5-thiadiazole, although large RNA peak broadening under these conditions suggests the need for stronger denaturants (Lu et al., 2020). Sumitomo and coworkers report 2.0 M acetic acid to have higher RNA denaturing ability than 2.5 M formaldehyde, or 7.0 M urea (Sumitomo et al., 2009). Lu and co-workers report optimal resolution of large RNAs (up to 6000nt) by replacing water in the background electrolyte and gel with 100% formaldehyde (Lu et al., 2020).

2.3.4 Separation of double-stranded RNAs and residual DNA templates

Major by-products of the in vitro transcription process such as double-stranded RNA must be detected and eliminated from the final mRNA product as studies have shown that dsRNA may be a trigger of cellular immune responses (Kato et al., 2006; Pichlmair et al., 2006). Several studies report chromatographic methods to purify therapeutic RNA products from dsRNA by-products and other related impurities (Baiersdörfer et al., 2019; Karikó et al., 2011; Weissman et al., 2013). An IP-RP chromatography method implementing a 0.1 M TEAA mobile phase system is shared among all of these studies. Baiersdorfer and coworkers raise concerns over this popular IP-RP chromatography purification strategy in terms of scalability, cost, and toxicity of materials (Baiersdörfer et al., 2019). As an alternative, the authors report the application of cellulose chromatography to isolate dsRNA. The authors further report higher mRNA translational capacity in vivo for mRNA purified through cellulose chromatography when compared to HPLC-purified mRNAs (Baiersdörfer et al., 2019). Alternatively, and with a focus on synthesis strategies, Wu and co-workers report a high-temperature IVT process that reduced immunogenicity without the need for post-synthesis purification steps (Wu et al., 2020).

From a quality control standpoint, qPCR can be used to detect residual DNA templates, while immunoblot assays may be used to detect residual dsRNA stands (Karikó et al., 2011). Ultimately, it is important to assure that synthetic RNA therapeutics are free from process-related impurities, functional, and with low immunogenicity.

2.4 Stability (storage conditions or shelf life)

One of the biggest limitations in mRNA vaccine candidates is their limited stability (Schoenmaker et al., 2021). Along with poor clinical outcomes, poor stability limits vaccine distribution to regions of the world that lack adequate facilities. Currently, mRNA vaccines manufactured by Moderna and Pfizer-BioNTech have shelf-lives of 30 and 5 days respectively when stored at 2–8°C, and up to 6 months when frozen at −20°C (Moderna) and −60°C to −80°C (Pfizer-BioNTech) (Crommelin et al., 2021). To overcome stability challenges, substantial effort has been put into the development of modified mRNAs and lipid carriers (Uddin & Roni, 2021). Enzymatic and chemical mRNA degradation may result in poor efficacy and immunogenicity, as there is a direct correlation between intact mRNA and therapeutic potency. mRNA modifications and degradation pathways have been previously discussed in detail (Crommelin et al., 2021; Sahin et al., 2014). Modifications to therapeutic mRNAs include 5′-capping, addition of the poly(A) tail, incorporation of 5′ and 3′-UTRs, and coding region optimization (Sahin et al., 2014) while traditional chemical degradation pathways include hydrolysis of N-glycosidic bonds and phosphodiester bonds, cytosine deamination, and oxidation of nucleobases or sugar moieties (Crommelin et al., 2021; Pogocki & Schöneich, 2000).

Recent development of mRNA therapeutics has focused on thermostable candidates and alternative formulations, such as freeze-dried formulations that may increase shelf-life (Uddin & Roni, 2021). As described by Crommelin and coworkers, stability data of mRNA vaccines in the literature are scarce (Crommelin et al., 2021). Published stability studies have adopted techniques such as chromatographic methods, gel electrophoresis, and stability through activity studies.

2.4.1 IP-RP and SEC chromatography

Currie and coworkers applied an IP-RP method for stability testing of an eGFP-modified mRNA under different stress conditions such as heat, hydrolytic conditions, and addition of ribonucleases. One hundred millimolar of TEAA was chosen as the mobile phase since it showed superior separation capabilities for larger RNA fragments when compared to the more hydrophobic ion-pairing agents HAA and DBAA (Currie et al., 2021). Under hydrolytic conditions (at low and high pH), the main peak height decreased by over 50% in only 30 min. A more prolonged degradation process was observed when the analyte was exposed to RNase A. Significant sample degradation is observed when the mRNA was exposed to heat at 85°C for 90 min. However, samples incubated at 85°C for 15 min showed no induced degradation. Stability at 85°C for 15 min is important because it simulates mRNA stability in LC columns during chromatographic analysis since the column temperature was set at 80°C and analyte retention time was just under 15 min.

In another stability study, nondenaturing SEC showed the superior stability of liquid formulations buffered with PBS or citrate over formulations with water or sucrose. Liquid formulations were stored at 37°C for 7 days. Similar conclusions were obtained when samples were characterized through IP-RP. It is hypothesized that chelating characteristics from PBS and citrate buffers prolonged mRNA stability, since mRNA is known to degrade in the presence of divalent ions such as magnesium (Spivak et al., 2016).

2.4.2 Agarose gel electrophoresis

In a patent published in 2016, Ketterer and coworkers present a set of RNA stability studies after lyophilization and storage under temperatures including −80°C, 5°C, 25°C, and 40°C (Ketterer et al., 2016). The cryoprotectant trehalose was used in all tested conditions. Gel electrophoresis was used to determine mRNA relative integrity. Overall, mRNA integrity was preserved for 12 months under −80°C, 5°C, and 25°C. However, a significant loss of relative integrity was seen in lyophilized mRNAs stored under 40°C.

The effects of cryoprotectants in post-freeze-drying storage of RNAs have been further evaluated by Jones and co-workers (Jones et al., 2007). Two different freeze-drying conditions were analyzed: purified RNA diluted in 1:1 water or 20% trehalose. Final concentrations consisted of either 25 µg/mL RNA in water or 25 µg/mL RNA in 10% trehalose. Upon freeze-drying, samples were stored at either −70°C, −20°C, or 4°C. RNA integrity was evaluated after 1 week, 1, 3, 6, and 10 months. The initial detection method was UV spectroscopy, however, inaccuracies arising from high absorption of degraded RNAs and nucleotides were observed. As an alternative, agarose gel electrophoresis was used for quantitation of relative RNA integrity. The stability of RNA freeze-dried with trehalose was superior. The study showed an approximate 80% recovery of RNA freeze-dried with trehalose in all three temperatures. Recovery was consistent among all time points. RNA freeze-dried without trehalose showed no recovery at 4°C and less than 10% recovery post 10-months storage at both −20°C and −70°C

In a longer stability period, Gerhardt recently reported a 21-month stability study for an RNA vaccine platform (Gerhardt et al., 2022). Lyophilized materials were stored at 4°C, 25°C, and 40°C while frozen materials were stored at −80°C and −20°C. For comparison, RNA in a liquid formulation kept at 4°C or 25°C was also included in the stability assessment. All conditions were compared to freshly prepped materials using RNA integrity through gel electrophoresis. Both liquid formulations (kept at 4°C or 25°C) and lyophilized product stored at 40°C showed more than 80% degradation after 6 months. Lyophilized sample stored at 4°C and 25°C, as well as frozen samples stored at −20°C and −80°C showed comparable RNA integrity, with RNA levels being within 30% of the of the control RNA.

2.5 Quality

As critical quality attributes of mRNA therapeutics or vaccines, control over the degree of consistency of capping efficiency and 3′-polyadenosine (poly A) tail length should be determined and quantified.

2.5.1 IP-RP LC-MS: Capping efficiency (%)

Most of our understanding of the analysis using IP-RP liquid chromatography-UV and mass spectrometry (IP-RP LC-UV MS) for the mRNA capping efficiency comes from the study by Beverly et al. (2016). IP-RP LC-UV-MS was used to determine the extent of capping mainly by characterizing the 5′-capped and uncapped structures during the development of manufacturing processes for mRNA vaccines (2021a2021b).

For capping efficiency analysis, a short mRNA fragment (~20–30 mers), including 5′-capped portions were generated by enzymes from intact mRNAs. This step was necessary, as MS is limited in analyzing full-length mRNAs with high molecular weights (approximately 2000–4000 mers). As shown in Figure 4A, the synthesized IVT mRNA was hybridized with a biotinylated capture probe complementary to the 5′-end of the mRNA. RNase H, which specifically hydrolyzes the phosphodiester bonds in RNA–DNA hybrids, was used to generate small 5′-mRNA fragments (Beverly et al., 2016). The cleaved 5′-mRNA fragments were isolated by streptavidin-coated magnetic beads for LC-MS analysis. The separation was performed using a C18 column at 75°C with an 8.15 mM triethylamine/200 mM hexafluoroisopropanol mobile phase (pH 7.9). As a result of the digestion by RNase H, not only were major cleavage species but also minor cleavage species identified in the samples, which affected both data complexity and reproducibility (Figure 4B). Moreover, unmethylated G cap, di-, and triphosphates were observed as the major uncapped species. They also reported that equal amounts of capped and uncapped samples showed similar UV responses at 260 nm and ion counts in MS. To determine capping efficiency, the peak area of all the identified uncapped species was divided by the peak area of the uncapped plus the capped species (Beverly et al., 2016). The challenges of this approach are the low recovery and the generation of minor cleavage species after RNase H cleavage. These observations suggest that the selection of an enzyme with high specificity and the optimization of assay conditions were crucial factors affecting reproducibility.

A similar approach was taken by Liau at Agilent Technologies who evaluated site-directed cleavage using non-thermostable and thermostable RNase H (Liau, 2021b). The separation was carried out using a C18 column at 50°C with a 15 mM dibutylamine/25 mM hexafluoroisopropanol mobile phase. The use of thermostable RNase H without an additional annealing step and sample cleanup using silica-based spin columns led to less sample preparation time and greater sample recovery. Uncapped, capping intermediates, and capped species were observed. Slippage sequence variants with an additional G inserted were also observed due to T7 transcriptional slippage frequently seen in the presence of repeated G nucleotides at the start of a transcribed sequence (Liau, 2021b).

2.5.2 IP-RP LC-MS: Poly A tail length distribution

The first major advance in the determination of the poly A tail length distribution using IP-RP LC-MS was the method developed by Beverly and co-workers at Novartis (Beverly et al., 2018). They used a 2100-mer mRNA with poly A tails of 27, 64, 100, and 117 in length produced either enzymatically or by plasmid-encoding. Their method included ribonuclease (RNase) T1 digestion to cleave phosphodiester bonds at the 3′-side of guanine, followed by oligo dT-coated magnetic beads to isolate the poly A tails. The separation was completed using a C18 column at 75°C with an 8.15 mM triethylamine/200 mM hexafluoroisopropanol mobile phase (pH 7.9). They observed significantly broader distributions of poly A tails in enzymatically tailed mRNA than in plasmid-encoded tails. Notably, they discovered that complete chromatographic separation of each distinct tail length was unnecessary for identification since the deconvolution software in mass spectrometry can distinguish each poly A species in coeluting samples (Figure 4C). As shown in Figure 4D, they observed the expected mass of 100-mer poly A tail and a series of masses separated 329 ± 2 amu (adenosine mass) using deconvolution software (Beverly et al., 2018).

Details are in the caption following the image
(a) The procedure for RNase H cleavage of the mRNA and isolation of the cleavage fragment by magnetic beads. The paired arrows indicate the two cleavage sites observed by LC-MS. (b) The total ion chromatogram (top) and electrospray mass spectra (bottom) of a 125-pmol mixture of uncapped and Cap 1 mRNA 5′ cleavage fragments. (c) UV 250 nm chromatograms from the LC-MS analysis of mRNA with poly A tails of 27, 64, and 100. (d) The resulting deconvoluted electrospray mass spectrum of the single peak in the chromatogram for the 100-mer tail length. Reprinted with permission from Beverly et al., (20162018), Analytical and Bioanalytical Chemistry. [Color figure can be viewed at wileyonlinelibrary.com]

Similarly, Liau at Agilent Technologies analyzed poly A tail sequence variants formed by poly A polymerase (PAP). The analysis revealed that PAP is not fully selective for ATP using standard IVT conditions, resulting in CTP and UTP additions to the poly A tails (Liau, 2021a). Samples were prepared by RNase T1 digestion followed by oligo-dT extraction. A 50% MeOH + 0.1% formic acid solution was used to clean the LC system and column, reducing alkaline metal adducts. The separation was completed using a C18 and styrene/divinylbenzene column at 50°C and 80°C, respectively, with the 15 mM dibutylamine and 25 mM hexafluoroisopropanol mobile phase. A heterogeneous poly A distribution in the deconvoluted mass spectra was observed, separated by 329.2 ± 1 Da (Liau, 2021a), which was similar to that seen by Beverly et al. (2018).

2.5.3 RT-PCR-base methods: Poly A tail verification

Several RT-PCR-based methods have been developed to study poly A tails using amplification of 3′-ends. As classic PCR-based assays, both rapid amplification of cDNA ends-poly A test (RACE-PAT) and ligase-mediated poly A test (LM-PAT) use an oligo dT anchor for the RT reaction. In RACE-PAT, the oligo can be hybridized with any location in the poly A tail, resulting in heterogeneous cDNA populations after reverse transcription. LM-PAT is a more sensitive method for detecting subtle changes in poly A tail length because the oligo can be specifically targeted to the 3′-end of the poly A tail, resulting in full coverage of the poly A tail. Both methods suffer from internal priming during cDNA synthesis (Minasaki et al., 2014).

Other modifications of classic PAT methods have been developed using different primers anchored to the 3′-end (Bilska et al., 2022). The extension poly A test (ePAT) relies on the Klenow polymerase to extend the 3′-end of RNA with dNTPs on a DNA template. This approach provides higher resolution and more accurate measurement of poly A tail length than LM-PAT (Jänicke et al., 2012). The splint-mediated PAT (sPAT) utilizes the ligation by a single-stranded DNA splint between an RNA anchor and the 3′-end of mRNA. This method requires less RNA sample and fewer amplification cycles, and provides a more accurate reflection of poly A tail populations with less bias toward shorter poly A tails than previous methods (Minasaki et al., 2014). These PCR-based methods are required to analyze PCR products by gel electrophoresis (Bilska et al., 2022; Jänicke et al., 2012; Sallés et al., 1999) and provide indirect measurements requiring conversion to cDNA (Beverly et al., 2018).

Recently, digital droplet PCR (ddPCR) was used for poly A tail verification according to the specification section of the COVID-19 vaccine assessment report (Pfizer&BioNTech) generated by the European Medicines Agency (EMA) (2021a). However, there is no publicly available information or published studies demonstrating the specific application of ddPCR to poly A tail verification. In general, ddPCR provides direct absolute quantitation without standard curves and tolerates the presence of sample contaminants, resulting in more precise and reproducible data than qPCR (Taylor et al., 2017).

2.5.4 IP-RP LC-UV: Poly A tail on/off (%)

According to the specification section of the COVID-19 vaccine assessment report (Moderna), RP-LC has been used to determine % poly A tailed RNA (2021b). Moderna has applied for a patent that includes the related method to determine and quantify tailless RNA (T0) mRNA populations from tailed mRNA variants of multiple lengths (T0, T40, T60, T80, T95, T100, T105, 120, 140) (Issa & Packer, 2019). The researchers observed that mRNAs with tail length variants were well resolved using a divinylbenzene (DVB) polymer column at 80°C with a 100 mM tris acetate/2.5 mM EDTA mobile phase (pH 7.0 ± 1). The intrinsic hydrophobicity (C<G<T<A) (Huber et al., 1995) of adenosine resulted in increased retention time in a reversed-phase system (Issa & Packer, 2019). Therefore, polyadenylated mRNAs can be retained more strongly on the RP column than non-polyadenylated RNAs (Azarani & Hecker, 2001).

3 RECENT MS-BASED APPLICATIONS TO SUPPORT TOP-DOWN AND BOTTOM-UP ANALYSIS OF MRNA

3.1 Intact mass measurement of mRNA and larger oligonucleotides

Of the many quality attributes of mRNAs that can be measured, perhaps the most difficult to determine is the intact molecular weight. However, this is an outstanding way to quickly ascertain the homogeneity of the product and to determine if all of the major features of the mRNA are present. However, due to the high molecular weight of mRNA therapeutics, most traditional approaches are not possible.

3.1.1 Charge detection mass spectrometry (CDMS)

The analysis of megadalton ions by mass spectrometry began in 1992 with the determination of 5 MDa ions from polyethylene glycol (Nohmi & Fenn, 1992). The Smith group was the first to measure individual ions using FT-MS starting with polyethylene glycol and quickly adapting this approach to large oligonucleotides up to 110 MDa (Bruce et al., 1994; Chen et al., 1995; Cheng et al., 1996). However, this approach used flow injection and took several hundred seconds to accomplish, which meant it was not compatible with LC.

Brenner et al. coined the term CDMS in 1995 (Fuerstenau & Benner, 1995). In their experiment, a time-of-flight mass analyzer was used to determine the velocity of an individual ion with a known amount of energy to enable measurement of the mass-to-charge ratio. This approach evolved to determining the current generated on an ion tube as single ions, held in a linear ion trap, were repeatedly passed through the tube (Benner, 1997). More recently, this has been accomplished inside orbitrap mass analyzers, providing significant increases in the mass resolution of the approach (Kafader et al., 2020; Wörner et al., 2020). However, the most significant challenge for CD-MS remains the time it takes to make a measurement. Recent, advances have allowed up to 13 ions to be measured in parallel (Harper et al., 2019). This advance increases the measurement speed by about 90%, however, it still remains an infusion experiment and is therefore limited to looking at the final API (Jarrold, 2022). There were several examples using CD-MS to look at intact mRNAs reported at the most recent meeting of the American Society for Mass Spectrometry (Brophy et al., 2022; Foreman et al., 2022; Jarrold, 2022).

3.1.2 LC-MS

The ideal approach would be to determine intact mRNA molecular weights from chromatographic runs. This would potentially allow for characterization of a wide range of potential impurities within the API in a manner more consistent with other drug products. To date, there are no published papers using this type of approach. A broad distribution of charge states in highly charged large molecules results in a spread of signal intensity requiring greater mass resolution. The heterogeneity of IVT mRNA resulting from poly A tail length distribution also faces technical challenges with MS data complexity and less signal intensity due to distributed species. However, at the 2021 meeting of the American Society for Mass Spectrometry, there was a presentation from Moderna showing the analysis of a poly A tailed intact ~0.28 MDa mRNA (751 nucleotides + poly A tail) with the observation of a series of masses separated ~329 Da (adenosine) and ~135 Da (neutral base loss of adenine). Additionally, they analyzed a large and tailless ~0.75 MDa mRNA (2228 nucleotides) with mass errors less than 50 ppm using HILIC LC-MS with an unusual mobile phase (Schneeberger & Jiang, 2021). They tested several conditions (e.g., ammonium salts, organic phases, and alkylamines) in mobile phases to reduce the overall charge state distribution and increase the signal-to-noise ratio. They finally used octylamine and nonafluorotertbutyl alcohol (NFTBA) in their mobile phase. The use of NFTBA was first shown by Basiri et al. in 2017 where they demonstrated that it possessed an unusual ability to dramatically reduce the charge state envelope of oligonucleotides, which likely contributes to its success in this application (Basiri et al., 2017). Continued advancement of this approach would be significant for quality control testing of mRNA therapeutics.

With the advancement of CRISPR-Cas9 genome editing, the popularity as well as the need to characterize guide RNAs (sgRNA) have increased substantially as sgRNA allow for sequence-specific cutting of DNA at the target region (Jinek et al., 2012). Solid phase synthesis of these RNAs means that process-related impurities such as shortmers and longmers may be present in the final product (El Zahar et al., 2018). To characterize these products, recent analytical methods have been focused on the development of top-down LC-MS approaches. Although much smaller in size (100–150 nt) when compared to mRNAs, the strategies implemented to analyze these RNAs may be useful in the analysis of larger RNAs. Best experimental practices when using MS detection of sgRNAs consists of reducing the amount of adducts to simplify the MS spectrum. Different approaches have been adopted in the literature to decrease metal adduction of sgRNA. Labor-intensive approaches include the adoption of plastic mobile phase bottles over glass and overnight flushing of the instrument with 0.1% formic acid (Wei et al., 2022). Another strategy to decrease adduction has been to add an online, column-based, clean-up method to consistently remove salts and metal adducts from larger RNAs (Crittenden et al., 2023). The method uses a hydrophobic stationary phase with small pore sizes (100 Å) under low flow rates (20 mM piperidine in 80:20 ACN/H2O) to function as a size exclusion column. Under these conditions, the oligonucleotide elutes before formulation salts that can be directed to waste (Crittenden et al., 2023). Piperidine competitively replaces metal adducts to the backbone, resulting in a much cleaner mass spectrum that can be seen in Figure 5A (no sample clean-up with charge states heavily adducted) and Figure 5B (with the online sample clean-up).

Details are in the caption following the image
Comparison between sgRNA spectra showing charge states 41- to 43- without sample clean-up (A) and with sample clean-up (B). Without sample clean-up, the 100 mer sgRNAs are heavily adducted by alkali metals such as Na+ and K+. Reprinted with permission from (Crittenden et al., 2023). sgRNA, single guide RNA. [Color figure can be viewed at wileyonlinelibrary.com]

3.2 Bottom-up analysis of mRNA

3.2.1 LC-MS/MS considerations

Nucleic acid LC-MS/MS is an important tool in the characterization and identification of modifications and unknown sequences. However, due to size, the relative lower number of modifications (with the exception of N1-methyl-pseudouridine), and complex data analysis, mRNA LC-MS/MS is more challenging when compared to smaller nucleic acids or other biopolymers such as polypeptides and polysaccharides. In regard to size, mRNAs are substantially larger. The average molecular weight for amino acids is close to 110 Da, and monosaccharides typically weigh less than 200 Da. For nucleotides, the average molecular weight is 330 Da. Since each amino acid corresponds to a trinucleotide codon (approximately 1000 Daltons), the rate in which the molecular weight of mRNAs increase is substantially higher than other biopolymers. In addition to its extensive size and with the exception of N1-methyl-pseudouridine modification, IVT mRNA presents less modifications than other types of RNAs, such as tRNA. The combination of larger size and lower frequency of modification creates an elevated number of isomeric or identical oligonucleotide fragments following mRNA digestion (Jiang et al., 2019). With this in mind, successful application of LC-MS/MS relies on sample preparation and the ability to use RNases, such as T1, in conjunction with other RNases such as MazF and Colicin E5 to produce unique fragments that can be sequenced. Alternative approaches such as the use of partial RNase digestions have also been implemented to decrease the number of isobaric fragments and increase sequence coverage in LC-MS/MS of longer RNAs (Vanhinsbergh et al., 2022). Full mRNA product digestion may also be used alongside LC-MS/MS to identify impurities that may lead to loss of mRNA expression. By using full mRNA digestion before LC-MS/MS detection, Packer and co-workers identified the formation of lipid-mRNA adducts that led to losses in therapeutic activity (Packer et al., 2021). While innovative sample preparation methods have improved LC-MS/MS coverage of longer RNAs, data analysis is still complex and requires the use of sophisticated bioinformatic tools that will be further discussed in a later section.

3.2.2 MS/MS fragmentation and sequencing

Full characterization of an oligonucleotide generally requires MS/MS measurements to identify and confirm the placement of all chemical modifications in a sequence. Oligonucleotide MS/MS spectra are complex with signals arising from a wide variety of fragmentation mechanisms. The low mass region contains several signals related to pieces of the oligonucleotides but these signals can arise from many places in the sequence. These provide qualitative information about the sequence but do not provide any confirmation of the actual sequence of the oligonucleotide. These low-mass signals include phosphate (m/z 79 and 95) or the corresponding phosphorothioate (m/z 95 and 111). These signals are quite intense (often the base peak of the MS/MS spectrum) since they arise from every linkage in the molecule. They are often used as the product ion in selected ion reaction experiments due to their high relative abundance.

The base ions are present (m/z 110 for cytosine, 111 for uracil, 125 for thymine, 134 for adenine, and 150 for guanine). These can be used to provide information on the base composition of the oligonucleotide in a similar manner to the immonium ions in peptide sequencing but since there are many fewer different nucleobases present in oligonucleotides these are less impactful. However, they are useful in highlighting and providing confirming evidence for any non-standard base in the sequence. The remaining low mass ions are related to non-sequencing bearing ions such as ribophosphate (m/z 195) and diphosphate, which can arise through intramolecular rearrangements (m/z 159).

The nomenclature for the sequencing ions for oligonucleotides was developed by McLuckey and coworkers and follows a similar convention to peptides (McLuckey & Habibi-Goudarzi, 1993; McLuckey et al., 1992). The sequence ions arise from fragmentation along the phosphate backbone of the oligonucleotide. These fragments produce a series of ions indicative of charge retention on either the 3′ or 5′ end of the molecule. Sequence ions from the 5′-end of the molecule are a, b, c, and d, while fragment ions from the 3′-end are w, x, y, and z. The a-ion tends to be low in abundance and is typically observed as the a-base ion. Many of the proposed fragmentation mechanisms start with loss of a base followed by generation of the other fragment ions. Bartlett proposed that a-base ions arise when there is a charge on the phosphate linkage while a-ions form when the phosphate is neutral (Bartlett et al., 1996). This is supported by the higher probability of observing an a-ion when a low charge state precursor ion is selected.

Deoxynucleic acids tend to have a-base and w-ions as their most abundant sequencing ions. Ribonucleotides have higher abundance of b and y-ions. However, it is not unusual to see a wide variety of sequence ions at each phosphate linkage, however, the other sequence ions will be of lower abundance. It is also quite possible to see the same ion at multiple change states. For example, a w3-ion with one charge and two charges. In general, the number of charges on the sequence ions tends to increase by one for every three nucleotides in the ion. This results in multiple signals for many sequence ions. While this improves confidence in assignment of sequence, it also distributes the signal over more ions.

The remaining signals in the MS/MS spectra arise from double cleavages. These are difficult to use for sequencing since these are internal ions missing both ends of the oligonucleotide. To date, all sequencing methods are dependent on starting at either the 3′ or 5′-end of the molecule. However, it would be possible to use these internal cleavages to confirm known sequences since they could be predicted. However, this has not been done to date.

3.2.3 Ion-mobility MS (IMS-MS)

Even though reports on the application of IMS-MS for mRNA analysis are scarce in the literature, this technique has been effective in differentiating isomeric/isobaric RNA variants (Kenderdine et al., 2020; Quinn et al., 2013). IMS-MS has been successfully deployed in the analysis heavily modified RNA, such as tRNA (Lauman et al., 2023; Rose et al., 2015), and allowed for the development of analytical platforms that can explore the relationship between RNA modifications and biological significance (Rose et al., 2015). Further applications of IMS-MS in the analysis of ribonucleotide components has been recently reviewed in detail (Deng et al., 2022). Overall, the ability to identify complex mononucleotide mixtures may be a valuable tool in the analysis of digested therapeutic RNAs.

4 THE PROGRESSION OF OPEN-ACCESS BIOINFORMATICS TOOLS FOR OLIGONUCLEOTIDE LC-MS/MS SEQUENCE MAPPING

Historically, NGS and Sanger sequencing have been the predominant techniques used to determine oligonucleotide sequence identity and purity. However, bottom-up LC-MS/MS analysis provides several advantages including simultaneous analysis of several modifications, relative stoichiometric information, and increased speed of analysis (D'Ascenzo et al., 2022; Jiang et al., 2019). One of the biggest advantages of mass-based sequencing is the ability to characterize synthetic and natural modifications, a known limitation of conventional sequencing methods (Rozenski & McCloskey, 2002). To date, mass spectrometry is the only analytical tool that allows direct identification of nearly all possible RNA chemical modifications (Limbach & Paulines, 2017). Given the complexity of nucleotide MS fragmentation, the popularity of bottom-up LC-MS/MS for oligonucleotide sequencing relies on the development of bioinformatics tools that can be used to facilitate data analysis and increase throughput.

Bioinformatics tools may be used in two distinct scenarios, de novo sequencing or simply for confirmation purposes. De novo sequencing is attractive for exploratory work, but sequence determination is more challenging. The first study reporting the sequencing of a completely unknown sequence, established by mass spectrometry, was reported by Ni et al. (1996). The sequence-construction algorithm developed in this study is comprised of three major factors: (1) correct recognition of 3′- and 5′-terminus residues; (2) alignment of overlapping chains constructed from each terminus; (3) the use of accurate molecular mass and compositional constraints to reject incorrect sequence candidates (Limbach, 1996; Ni et al., 1996). Strengths and weaknesses of this algorithm have been previously reviewed in detail (Limbach, 1996).

Rozenski and McCloskey report several variables that may lead to incorrect data interpretation of fragment ions (Rozenski & McCloskey, 2002). The five variables highlighted by the authors include: (1) the recognition of unexpected effects on fragmentation that may result from modifications or unusual sequence context; (2) the challenge of assessing the possibility of alternate sequences from the data set due to small m/z differences between fragment ions, especially for RNAs containing uracil and cytosine; (3) increased ambiguities arising from low-intensity spectra or from longer oligonucleotide chains ranging between 10 and 20 bases; (4) multiple rational assignments to the same fragment ion; (5) input of incorrect initial parameters. The Simple Oligonucleotide Sequencer (SOS), a user-interactive computer program, provides a platform that is well-suited to address these challenges. The program includes certain modifications in the base, the sugar, or the backbone and the user-directed data manipulation facilitates the creation of sub-sequences that can be used for comparison. The interactive program enables fast and simple ab initio oligonucleotide sequencing by MS, however, the platform requires manual analysis of the data. Together, these studies highlight the potential of use of MS in oligonucleotide sequencing, and its potential use in the characterization of RNA posttranscriptional modifications (PTMs). Furthermore, these studies highlight the need for automated software to support high-throughput data analysis of MS-based RNA sequencing.

A common limitation from the algorithms discussed above is their restriction to oligomers that are approximately 15 bases long. By using an ion trap mass spectrometer, under selective CID conditions, Premstaller and Huber were able to generate a-B and w-type fragment ions and obtain good sequence coverage from a 20-mer oligodeoxynucleotide (Premstaller & Huber, 2001). On the basis of this finding, Oberacher and co-workers developed an algorithm that is able to correlate experimental MS/MS data with a set of predicted fragment ions from a known reference sequence for longer oligonucleotides (20–51 nt) (Oberacher et al., 2002). The algorithm can be used for confirmatory purposes, but also for the detection of sequence alterations including insertions, deletions, or point mutations. The comparative sequencing algorithm developed in this study has a higher tolerance for missing fragment ions when compared to previous algorithms, which allows the analysis of extended sequence sizes.

As the use of bioinformatics tools evolved, sophisticated databases began to be developed and used to match fragmentation data. The practice of correlating MS/MS data to sequence databases has been applied in proteomics for a long time, and it became a desirable tool for protein identification. In comparison to peptide fragments, oligoribonucleotide fragments are much less variable in composition and molecular mass. For this reason, developing software algorithms for DNA/RNA that are equivalent to proteomics ones is challenging. Nakayama and coworkers overcome this limitation by developing a program (Ariadne) that analyzes RNA fragmentation data in two steps (Nakayama et al., 2009). The first step involves nucleotide fragment identification based on an MS/MS ion search, while the second step involves the mapping of identified nucleotides on all RNA entries in the developed database.

A common factor that affects oligomer fragment ion identification is their adduction to different metals, and some algorithms offer limited support in the identification of adducted species. OMA and OPA, a software application developed by Nyakas and coworkers introduces the ability to match fragment ion data of modified oligonucleotides and their adducts to a list of reference peaks originated from the input sequence (Nyakas et al., 2013). Sodium, potassium and cisplatin adducts can be incorporated into the list of reference peaks. OMA and OPA are written in Java, which allows for compatibility within different operating systems.

RoboOligo, a search strategy for de novo sequence analysis, further supports automated search of complex modified RNAs based on CID MS/MS data. RoboOligo offers three main functions: Automated de novo sequencing, manual sequencing, and a hybrid function (variable sequencing). Through the automated function, 16 unique oligomer sequences of a RNase T1 digested tRNA were analyzed. Out of 16 oligomers, 15 sequences were annotated with the correct modified nucleosides. From the analysis of various RNA samples, the automated de novo algorithm correctly annotated 73 out of 77 oligomers. The manual sequencing function is conceptually similar to SOS, in the way that it enables ab initio oligonucleotide sequencing. It is a valuable tool in the analysis of longer oligomers. Lastly, variable sequencing facilitates the evaluation of nucleotide positioning and simplifies manual sequencing analysis.

With a focus on high throughput RNA modification mapping by LC-MS/MS, Yu and coworkers developed RNAModMapper (RAMM) (Yu et al., 2017). The application is designed to interpret CID data, and map MS/MS data into RNA sequences. A unique advantage of this workflow is the ability to handle more than 100 posttranscriptionally modified nucleosides. RAMM is able to account for traditional oligomer fragment ions c-, y-, w-, a-B-type ions, while also being able to account for neutral base loss ions. MS/MS data analysis can be done in two modes: fixed position mapping (used for targeted analysis) and variable position mapping (used when limited information is known about the location of modifications). Further application of this method shows a comparative analysis of different data acquisition methods. Four different acquisition methods were used: ion-trap and CID; orbitrap and CID; orbitrap and higher energy collision dissociation (HCD), and time-of-flight beam-type CID. The impact of the data acquisition from different experimental designs were evaluated based on correct and incorrect interpretations. For both fixed and variable position mapping, low-resolution acquisition (ion trap) showed the highest number of incorrect interpretations. These results highlight the need for mass analyzers that support high-resolution and high-mass accuracy and their ability to reduce the number of incorrect interpretations (Lobue et al., 2019).

Several open-access data processing tools have been recently released (D'Ascenzo et al., 2022; Ortiz et al., 2020; Wein et al., 2020). NucleicAcidSearchEngine (NASE) has been designed to address three major shortcomings of the previous tools: (1) efficiently handling of complex samples that involve many modifications; (2) lack of statistical validation strategies; (3) poor integration with the larger analytical workflow (Wein et al., 2020). Unique features such as accountability for precursor mass defects enable identification of longer oligonucleotides. In terms of efficiency, complex searches such as a 23-modification search of tRNA data set took less than 30 min per file. Similar searches in other platforms such as RNAModMapper had an estimated running time of 1 month. The program is well integrated within OpenMS. Along with efficient searches and integration with larger analytical frameworks, an innovative feature is the use of false-discovery rate (FDR). This statistical validation strategy prevents unreliable sequence assignments and the need for validation through manual assessment of the spectra.

FDR continues to be used in recently released search engines. D'Ascenzo and coworkers show the applicability of Pytheas, a software package, in the automated analysis of RNA sequences and modifications (D'Ascenzo et al., 2022). As shown in Figure 6, the data analysis workflow in Pytheas consists of five steps: in silico digestion, spectra matching/scoring against a target-decoy library, annotated spectra visualization, statistical analysis, and sequence mapping.

Details are in the caption following the image
RNA LC-MS/MS and Pytheas data analysis workflows. (A) Flowchart highlighting the main components of the Pytheas package. (B) Graphical overview of the database matching process. In both panels, blue shading refers to the experimental data acquisition, green shading refers to the in silico generated theoretical library and red shading refers to the matching and scoring steps. Reprinted with permission from D'Ascenzo et al. (2022), Nature Communications. LC-MS, liquid chromatography–mass spectrometry. [Color figure can be viewed at wileyonlinelibrary.com]

The ability of Pytheas for shotgun MS sequence characterization of therapeutic mRNAs is shown through the analysis of a mRNA vaccine mimic, containing the full coding sequence of the SARS-CoV-2 spike protein, as well as a complete uridine to N1-methylpseudoruridine (m1Ψ) substitution. When combining two nuclease treatments (RNase T1 and RNase A), Pytheas showed approximately 80% sequence and modification coverage. Limited coverage is not a limitation of the algorithm, it is a limitation that originates from sample digestion. Enzyme availability is limited, and, in many cases, cleavage specificities and efficiencies are poorly characterized.

Overall, sophisticated algorithms that facilitate high-throughput data analysis are a key element in mass-based sequencing of nucleic acids. The development of these tools continues to increase the speed of analysis and sequence-matching confidence. Further application of these tools relies on the access to new enzymes with improved cleavage specificities.

5 CHALLENGES AND FUTURE PERSPECTIVES

There are many challenges that are specific, or at least heightened, in the analysis of oligonucleotides. These include obvious ones such as the need to prevent the inadvertent introduction of nucleases to processes since these would cause degradation of mRNA products. Every reagent used in a process should be certified as nuclease-free or tested to ensure that it will not cause artifacts. The most significant reagents that need to be tested are any (1) water or water-based solutions, (2) tubes or containers, and (3) filters. It is critical that personnel handling reagents and supplies used in the quality testing of mRNA maintain nuclease-free lab spaces and use gloves at all times.

Highly charged molecules such as oligonucleotides are particularly susceptible to nonspecific adsorptive losses during sample preparation and analysis. Sample tubes, filters, and chromatographic systems have been found to be significant sources for these losses. For chromatographic methods, the main source for nonspecific adsorption of these negatively charged analytes is to positively charged metallic surfaces. During chromatography, analyte adsorption to the chromatographic system must be monitored. Loss of early eluting analytes due to nonspecific adsorption has been previously shown in SEC analysis of monoclonal antibodies, and in IP-RP chromatography analysis of smaller oligonucleotides (Guimaraes et al., 2022; Murisier et al., 2021), while a recent study by Fekete and coworkers has shown the propensity for mRNAs to adsorb to stainless steel LC hardware (Fekete, DeLano, et al., 2022). Although loss of early eluting impurities has not been demonstrated for impurity analysis of mRNAs, there is no reason to expect that it will not be at least as problematic as demonstrated for smaller oligonucleotides. To minimize nonspecific adsorption, it is critical to find materials with low surface activities/charge to use in the analysis of mRNAs.

The mRNA vaccines themselves may be heterogeneous. This would most likely occur in the poly-A tail. This should be one of the major differences between the two existing mRNAs. The use of the A30 and A70 segments in the BNT-162b vaccine is designed to greatly reduce or even eliminate this source of heterogeneity. However, any approach that just uses an adenotransferase will have significant heterogeneity.

The lipid nanoparticle delivery technology used with the currently marketed therapeutics will introduce two additional challenges to the analysis of these mRNAs. The first will simply be matrix effects during electrospray ionization mass spectrometry. Lipids are the greatest source of matrix effects, so it would be expected that analysis of formulated products versus the active pharmaceutical ingredient may require adjustments in methods to account for isolating the mRNA away from the large abundance of lipids. The second challenge has only recently been reported by Hua and co-workers from Moderna. At the 2022 ASMS Conference, they showed lipid modifications of the mRNA from the formulation. However, at this point it is not known if these modifications significantly alter the safety and efficiency of mRNA products but this is an area that needs more attention moving forward.

Biographies

  • image

    Guilherme J. Guimaraes is a fourth doctoral student at the University of Georgia. His research focuses on sample extraction, pharmacokinetic studies, and method development for the analysis of small molecules and oligonucleotides.

  • image

    Jaeah Kim obtained her PhD in 2019 from the University of Georgia with a focus on metabolite identification of therapeutic oligonucleotides. She previously worked for Biogen working in ADME of oligonucleotide therapeutics and for GC Pharma focusing on characterization and analysis of IVT mRNA therapeutics.

  • image

    Michael G. Bartlett is the University Professor in Pharmacy and Associate Dean for Science Education, Research and Technology at the University of Georgia, College of Pharmacy. Over the last ten years, his laboratory has had broad interest in the development and application of LC-MS methods for oligonucleotide therapeutics and endogenous oligonucleotides as biomarkers for diseases.