Reproducibility of human cardiac phosphorus MRS (31P‐MRS) at 7 T

Purpose We test the reproducibility of human cardiac phosphorus MRS (31P‐MRS) at ultra‐high field strength (7 T) for the first time. The primary motivation of this work was to assess the reproducibility of a ‘rapid’ 6½ min 31P three‐dimensional chemical shift imaging (3D‐CSI) sequence, which if sufficiently reproducible would allow the study of stress‐response processes. We compare this with an established 28 min protocol, designed to record high‐quality spectra in a clinically feasible scan time. Finally, we use this opportunity to compare the effect of per‐subject B 0 shimming on data quality and reproducibility in the 6½ min protocol. Methods 10 healthy subjects were scanned on two occasions: one to test the 28 min 3D‐CSI protocol, and one to test the 6½ min protocol. Spectra were fitted using the OXSA MATLAB toolbox. The phosphocreatine to adenosine triphosphate concentration ratio (PCr/ATP) from each scan was analysed for intra‐ and intersubject variability. The impact of different strategies for voxel selection was assessed. Results There were no significant differences between repeated measurements in the same subject. For the 28 min protocol, PCr/ATP in the midseptal voxel across all scans was 1.91 ± 0.36 (mean ± intersubject SD). For the 6½ min protocol, PCr/ATP in the midseptal voxel was 1.76 ± 0.40. The coefficients of reproducibility (CRs) were 0.49 (28 min) and 0.67 (6½ min). Per‐subject B 0 shimming improved the fitted PCr/ATP precision (for 6½ min scans), but had negligible effect on the CR (0.67 versus 0.66). Conclusions Both 7 T protocols show improved reproducibility compared with a previous 3 T study by Tyler et al. Our results will enable informed power calculations and protocol selection for future clinical research studies.


| INTRODUCTION
Phosphorus MRS ( 31 P-MRS) is a non-invasive technique used to measure the concentrations and chemical kinetics of high-energy phosphoruscontaining metabolites in the human heart, often collectively referred to as 'cardiac energetics'. 31 P-MRS has provided a unique insight into our understanding of cardiac metabolism. 1,2 The application of 31 P-MRS to cardiovascular research is of interest, since in most major heart diseases the ratio of phosphocreatine (PCr) to adenosine triphosphate (ATP) concentrations (PCr/ATP) changes, making it a useful indicator of the altered energetic state of the heart. Examples of diseases where PCr/ATP decreases include Type I and Type II diabetes, 3,4 hypertensive heart disease, 5 coronary artery disease 6,7 and heart failure. 8,9 However, 31 P-MRS is yet to be translated into routine use in a clinical setting, owing mainly to its intrinsically low signal-to-noise ratio (SNR).
According to theory, the quality of the raw 31 P-MRS signal (SNR∕ ffiffiffiffiffi ffi T A p , where T A is scan duration) increases approximately linearly with the scanner's magnetic field strength B 0 . 10 Accordingly, 31 P-MRS benefits significantly from a move to ultra-high field strengths, ie B 0 ≥ 7 T. At 7 T, an increase in SNR of 2.8-fold in the human heart compared with 3 T has recently been demonstrated, 11 allowing the acquisition in 6 min of spectra of a quality comparable to that of spectra that took 31 min at 3 T. Acquiring usable 31 P spectra in shorter scan times is highly desirable; a finer temporal resolution would allow study of the response cardiac energetics to stressors (eg dobutamine infusion or exercise).
It is, however, questionable whether such a short protocol would be sufficiently robust and reproducible to detect changes in cardiac metabolism with sufficient power. Therefore, the primary motivation of this work was to assess reproducibility values of PCr/ATP 31 P-MRS measurements at 7 T in the human heart for a 'rapid' 6½ min three-dimensional chemical shift imaging (3D-CSI) sequence. If found to be sufficiently reproducible, this would allow the study of multiple steady states within a single scanning session. We compare this with an established 28 min protocol that was designed to record high-quality spectra in a scan time that is tolerable to a majority of patients. 11 This will allow informed decisions when selecting protocols in the design of future studies. We also evaluate the effect of per-subject B 0 shimming on spectral quality and reproducibility. Finally, we illustrate the impact of these technical improvements by comparing the sample sizes and approximate scan costs of studies using these optimized 3 T and 7 T protocols.

| EXPERIMENT
Ten healthy volunteers (three female, age 29 ± 6 years, BMI 23 ± 4 kg/m 2 ) were recruited according to local ethics regulations. Volunteers made two separate visits. On each visit they underwent two scanning sessions as shown in Figure 1. On the first visit, two 28 min 31 P-MRS spectra were acquired (one during each scan session) using the manufacturer's default shim settings. On the second visit, three 6½ min 31 P-MRS spectra were acquired in both scanning sessions. During the second visit, we also tested a customized B 0 shimming algorithm against the manufacturer's default shim settings.   ). B 0 maps were measured starting from the scanner manufacturer's default shim settings (the 'tune-up' shim settings). Persubject shim solutions were obtained by recording a B 0 map using a dual-echo GRE shim work-in-progress package (WIP 452B 'GRE SHIM' , Siemens). This field map was altered in MATLAB (MathWorks, Natick, MA, USA) by zeroing all pixels with an intensity less than that in the inferior myocardium and setting the magnitude of all remaining pixels to 1. The phases of the pixels were left unaltered. This modified B 0 map was then used as input to the scanner's standard shim calculation routines. These modifications prevent the very high-intensity pixels near the surface dominating the computed shim solution, which might otherwise degrade the shim quality in the inferior cardiac segments.
The 1 H coil was replaced with a 16-element 31 P receive array coil (Rapid Biomedical, Rimpar, Germany), consisting of a single rectangular 28 × 27 cm 2 transmit loop and a 4 × 4 matrix of 16 circular flexible receive loops 5.5 cm in diameter. The coil was placed in the same position above the mid-ventricular septum. 12 The transmit efficiency was calibrated per-subject by using a series of inversion-recovery free induction decays to acquire signal from a central spherical phenylphosphonic acid (PPA) fiducial mounted on the coil housing and processed with custom MATLAB code. The coil position was determined from three orthogonal single-channel 31 P FLASH images, which localize five PPA fiducials (including the centre fiducial used to compute the B 1 calibration) using custom MATLAB code.
A 25 mm thick, B 1 -insensitive train to obliterate signal (BISTRO) saturation band was placed in the anterior chest wall to suppress signal from skeletal muscle as previously described. 13 The voltage of the saturation pulse was set to the maximum value allowed to comply with the legal specific absorption rate limits in each subject. Excitation was placed at +266 Hz relative to PCr so as to cover metabolites from 2,3diphosphoglycerate (2,3-DPG) to γ-ATP. The CSI grid was positioned in the short-axis view of the heart, such that the longest voxel dimension was aligned with the intraventricular septum and the in-plane voxel matrix was parallel to the chest wall. The CSI matrix was fixed at the point of acquisition, and not shifted in post-processing. Respiratory gating and ECG triggering were not used.
In the first visit, a single 31 P dataset was acquired in 28 min with a 3D ultra-short echo (UTE)-CSI sequence with the following parameters: matrix size 16 × 16 × 8; nominal voxel size 15 × 15 × 25 mm 3 ; acquisition weighting with 10 averages at k = 0; repetition time 1 s and whitened singular value decomposition (WSVD) coil combination. 14 The excitation was set to 400 V peak amplitude (ie 3.2 kW), giving a flip angle of approximately 30 0 in the interventricular septum. RF excitation was performed using a shaped pulse that has been previously described. 15 It comprises a 0.5 ms hard pulse, preceded by a numerically optimized 1.9 ms part that improves homogeneity of excitation. It excites an approximately 2 kHz bandwidth. Subjects were then removed from the magnet-this constituted Session 1. After a short (~5 min) break, localization was repeated and an identical 28 min 3D UTE-CSI sequence was run in Session 2. These datasets are labelled X and Y (see Figure 1 for detailed description).
During the second visit, three sets of 31 P spectra were acquired in 6½ min each. The same 3D UTE-CSI sequence was used as described above, but with an 8 × 16 × 8 matrix; nominal voxel size 25 × 15 × 25 mm 3 and four averages (k = 0) to enable the shorter acquisition time. Using larger voxels allowed us to reduce scan duration while keeping enough samples at k = 0 to preserve a compact voxel point-spread-function with minimal side-lobes. The first and third datasets used per-subject B 0 shimming; the second used the vendor's standard tune-up shim settings. As in the first visit, subjects were then removed from the magnet-this constituted Session 3. After a short (~5 min) break, these steps were repeated as Session 4. These datasets are labelled A-F (see Figure 1 for detailed description).

| Data analysis
Four voxels in the mid-interventricular septum were identified for further analysis: the midseptal, anteroseptal, anterior and posterior voxels.
These were assigned anatomically; the midseptal voxel was two voxels posterior to the chest wall. Data from these voxels were fitted using the Oxford Spectroscopy Analysis (OXSA) toolbox's implementation of AMARES. 16,17 Prior knowledge specified 11 Lorentzian peaks, fixed amplitude ratios, and literature values for the scalar couplings for the multiplets. Blood contamination and partial saturation were corrected using T 1 values from the literature. 11,15 All scans were included in the analysis. PCr/ATP is reported as the blood-and saturation-corrected values of PCr/γ-ATP, excluding the α-ATP peak because it has contributions from NADH, and the β-ATP peak because it was outside the uniform flip-angle bandwidth of the excitation pulse at 7 T. Cramér-Rao lower bounds (CRLBs) were used to express the uncertainty in metabolite concentrations. 18 Reproducibility statistics of the four identified voxels and their spectral sum were calculated.

| Assessment of reproducibility
Intersession variability was assessed through the mean and difference between PCr/ATP ratios from equivalent datasets in both protocols for each subject, ie by comparing Dataset B with E, C with F, and X with Y (see Figure 1 for detailed description).
Intersubject variability was assessed by the mean and standard deviation (SD) of PCr/ATP ratios within the same datasets across all subjects.
Intrasession variability, only applicable to the second visit, was assessed through the mean and difference between PCr/ATP ratios from equivalent datasets within the same session, ie by comparison of Dataset A with C and Sataset D with F.
The coefficient of reproducibility (CR) was calculated from SD of the signed differences in PCr/ATP between two scans for each subject according to A lower CR reflects a better method. A two-tailed Mann-Whitney U (non-parametric) test was used to compare repeated measurements. Variances were compared using a Brown-Forsythe test. 19 The coefficient of variation (CV), defined as the sample SD divided by the mean, is also reported.
CV ¼ SD=mean: Datasets X and Y were compared with those collected from 25 patients with dilated cardiomyopathy (DCM) in a previous study using the same 28 min protocol, coils and scanner. 20

| RESULTS
Typical spectra from the 28 min and the 6½ min CSI protocols (using the tune-up shim) from across the heart shown are shown in Figure 2. Lower signal is observed in the posterior voxels owing to the use of a surface coil. Across all subjects, the SNR of PCr was 1.2 times greater in the 28 min CSI scan compared with the 6½ min scan (having fewer averages but larger voxels). Furthermore, in the 6½ min scan we observed a larger 2,3-DPG amplitude (3.5 ± 0.62 versus 2.9 ± 1.06, P = 0.07), reflecting greater blood contamination from the larger voxels.  FIGURE 3 Variation in PCr/ATP value measured at four voxel locations and combinations of spectral sums of these voxels for 28 min CSI (A) and 6½ min CSI (B). The tables below provide the PCr SNR, the CRLB on PCr, the CV (intersubject SD/mean) and the CR (1.96 × interscan SD) of the measured PCr/ATP. Please note that the values here refer to interexamination repeatability in these voxels than in the midseptal voxel. The posterior voxel has both a lower quantification precision (higher CRLBs) and worse reproducibility than the mid-septal voxel. Summing spectra from the midseptal voxel with anteroseptal and anterior voxels before fitting leads to a reduction in PCr/ATP CRLB but comes at a cost to the reproducibility of the measurement. All subsequent analyses therefore calculated PCr/ATP from the midseptal voxel, not the summed voxels.

| 6½ min protocol
There were no significant differences ( Figure 4)  0.29 ± 0.18 (per-subject shim). There was an improvement in the CRLBs between tune-up and the customized shimming algorithms (13.8% versus 10.3%, P = 0.02), showing that the higher-quality spectra obtained using the customized shimming enabled more precise metabolite quantification.
The reproducibilities of PCr/ATP measurements using the two shimming techniques were equivalent (CR 0.66 tune-up shim versus 0.67 custom shim).
The interexamination CR was 0.75 and the intraexamination CR was 0.67 (both reported for the custom shim comparison). There was no significant difference (P = 0.85) between these inter-and intraexamination changes.

| Power calculations and sample size
Sample sizes providing sufficient power to reveal differences in PCr/ATP for paired (eg response to stressors or patient longitudinal cohort) studies are given in Table 1. These were calculated using the reproducibility values obtained in this work and literature results at 3 T.

| DISCUSSION
We have tested the reproducibility of a 'rapid' 6½ min 3D-CSI protocol, which is short enough to allow the assessment of changes in PCr/ATP during rest, stress, and recovery in a single scanning session. We also tested reproducibility for an established 28 min 3D-CSI protocol that was designed to give the best spectra in an examination tolerable to cardiac patients. We believe that this study reports the first reproducibility data for human cardiac 31 P-MRS at 7 T. These are crucial for planning future clinical studies.
Reproducibility of human cardiac 31 P-MRS has previously been reported at lower field strengths (summarized in Table 2). These studies encompass a range of field strengths, localization methods and voxel volumes, which makes it hard to draw quantitative conclusions when comparing these studies. However, it is clear that our 28 min protocol has the tightest reproducibility of all the 3D-resolved protocols, and that it achieves this with a small 5.6 mL nominal voxel volume.

| Analysis methods
The midseptal voxel is the most reproducible of all the single voxels and gives PCr/ATP consistent with literature values. The anterior and anteroseptal voxels have higher PCr/ATP values and we hypothesize that this is due to contamination of the 'cardiac' spectra by small amounts of skeletal muscle signal that may not be fully suppressed in some scans by the saturation bands (PCr/ATP is approximately 4-5 in skeletal muscle versus 2 in myocardium). 21 Quantification of PCr/ATP values in the posterior voxels is the least reproducible, and for the 6½ min protocol has an extremely large CRLB (111%). This is probably due to a combination of the effects of distance from the surface coil, and of motion in the posterior voxel.
In 2009, Tyler et al performed a cardiac 31 P-MRS reproducibility study at 3 T. 15 They used the spectral sum from three voxels in their analysis because they observed lower PCr/ATP CRLB than for a single midseptal voxel. They attributed this increase in precision to the gain in SNR from combining signals. In this study, at 7 T, we also saw increased SNR on summing voxels, but no corresponding improvement in reproducibility.  15 We made this analysis using the original raw data from that study provided by Professor Tyler. If comparing this table with Reference 15, please note that the values reported in Table 1 of Reference 15 were computed using the SD of the absolute difference in PCR/ATP for Scan 1 and Scan 2, whereas here we have used the SD of the signed difference.
Our analysis method introduces minimal user bias: no spectra were excluded from the study based on appearance, and as the rest of the fitting in OXSA is automated the measured PCr/ATP is only dependent on which voxel was selected for analysis. However, as measured PCr/ATP varied at differing anatomical locations across the heart, the selection of the voxel is an important step. It was therefore important to define anatomically which voxel would be selected for analysis at the start of the study, and ensure that it was used in all subjects. Despite SNR limitations, 3 T 31 P-MRS has allowed observation of cardiac energetic changes in many disease groups. In recent years, 3 T has been the most widely used field strength for cardiac phosphorus scans. The 28 min CSI protocol tested here has equal voxel sizes (5.6 mL nominal) and a similar scan duration (28 versus 31 min) to the 3 T protocol tested by Tyler et al. 15 In that study, they found intrasubject percentage differences of 20% (absolute difference 0.43, mean PCr/ATP 2.10) and a CR of 1.1. In this study, we report lower intrasubject percentage differences of 11% (absolute difference 0.21, mean PCr/ATP 1.91) and a CR of 0.49 (ie more reproducible).

| 6½ min protocol
The 'rapid' 6½ min CSI acquired datasets in less than one-quarter of the time of both those in the 3 T study by Tyler et al and the 28 min scan tested here. As expected, a decrease in scan time led to increased variability in PCr/ATP measurements and therefore a larger CR (0.49 for 28 min CSI versus 0.67 for 6½ min CSI). Despite this, the reproducibility of PCr/ATP measurements acquired in 6½ min at 7 T is still greater than that of measurements acquired in 31 min at 3 T, so a move to ultra-high field means that data can be acquired more quickly without cost to reproducibility. This is important as short 31 P acquisition times allow the detailed study of the response of cardiac energetics to stressors, such as exercise or dobutamine infusion. For example, the British Society of Echocardiography's recommendations for a dobutamine protocol for assessment of myocardial ischemia are to use four steps each with 3 min of dobutamine infusion. 22 7 T 31 P stress using our 6½ min 31 P scan would therefore be possible during dobutamine stress complying with these guidelines, whereas a 31 min scan at 3 T would not be. We note in passing that Dass  23 However, the reproducibility of that 8 min 3 T protocol has not been reported, so we cannot compare its reproducibility against our 7 T results.
Bakermans et al tested the reproducibility of a 7 min sequence in the human heart at 3 T, performing spatial localization with 3D ISIS or a combination of 1D ISIS and 1D CSI either perpendicular or parallel to the surface coil. 24 3D ISIS was found to be the most reproducible of these methods, giving a PCr/ATP of 1.57 ± 0.17 (mean ± SD) and a CR of 0.64. The reproducibility of this 7 min 3 T scan is the same as that achieved in a 6½ min 7 T scan in this work (CR 0.64 at 3 T versus CR 0.67 in this work) but used substantially larger voxel sizes: 3D ISIS voxel size 512 mL. We observed no significant differences between the variances of the intra-and inter examination differences (P = 0.85), suggesting that the variability in measured PCr/ATP is dominated by error from within the 31 P measurement itself, rather than experimental set-up (eg coil positioning, localization). This is in contrast to the finding of Lamb et al in their reproducibility study at 1.5 T. 25 There, they used a 10 cm 31 P loop for both transmission and reception. In this study we used a larger coil with a rectangular 26 × 28 cm 2 transmit loop and a flexible 4 × 4 array of 4 cm diameter receive elements. By using a larger coil, we have mitigated some of the challenges associated with placing small loop coils and so our results were less affected.

| Per-subject B 0 shimming
Per-subject B 0 shimming improved the precision of the fit of PCr/ATP, as indicated by the lower CRLBs (mean 13.8% for tune-up versus 10.3% for per subject, P = 0.02). However, on our system, the reductions in the linewidths of PCr and γ-ATP were not significant and there were no improvements in reproducibility (CR 0.67 for tune-up, 0.66 for per subject). Presumably, uncertainties in optimizing the shim solution counter-balanced the improved spectral SNR in terms of reproducibility. Additionally, as no cardiac triggering or respiratory gating was used, the calculated shims only apply exactly at one phase in the cardiac and respiratory cycle, which might explain the lack of improvement. It is not therefore immediately obvious whether per-subject B 0 shimming for cardiac 31 P-MRS at 7 T is worth the additional examination time that is required: two iterations of the shimming algorithm adds two approximately 20 s breath holds to the protocol.
If a single 6½ min 31 P measurement is being included in an examination, then it is likely that per-subject B 0 shimming is not worth the extra time. In this case, if time permits, better data would probably be obtained by using the time that would have been spent shimming to acquire more averages in the CSI protocol. However, if stress-response energetics are being monitored (and therefore a CSI protocol is being repeated multiple times while the subject remains in the magnet) then all of the datasets would benefit from performing the per-subject shimming algorithm at the start of the scan.
Our difficulties with image-based shimming at 7 T may stem from the inhomogeneous fields produced by the 10 cm surface coil we used. More sophisticated coil designs with better coverage, and dual-tuned designs, may tip the balance in favour of per-subject B 0 shimming. 26

| Power and sample size
The reproducibility values presented here enable power and sample size calculations to be performed-an important step in the design of clinical studies. The lower intrasubject variability at 7 T compared with 3 T translates to smaller sample sizes required for sufficient statistical power to detect a given effect. For example, in order to detect a change of 0.2 in PCr/ATP with 80% power in a paired study, power calculations from 3 T data from the Tyler et al 15  When designing studies, cost is an important factor. Scans at 7 T are more expensive compared with equivalent scans at 3 T: for example, in our centre, the cost of a 7 T scan is about 70% more than a 3 T scan. Despite this, the smaller cohort size required at 7 T for sufficient statistical power would overall lead to a saving in scan fees, eg a 100 × (1-1.70 × 15/64) = 60% saving, using values for the 28 min protocol. Additionally, there would be further savings from the use of fewer consumables and less time spent on patient recruitment. This may also enable a shorter study, delivering more timely information to clinical decision makers. However, at present, our site has more restrictive exclusion criteria for scans at 7 T than at 3 T. Nevertheless, in a recent 7 T 31 P-MRS study in patients with DCM 20 approximately 60% of patients who completed the laboratory screening form were found to have no safety contraindication to MRI at 7 T and were able to participate fully. In our experience, the challenges of patient safety are not insurmountable at 7 T. In particular, we would have excluded far fewer subjects if we had had access to more complete medical device testing data that included 7 T. We expect that this will become available in the coming years.

| Limitations
We analysed our results in terms of the PCr/ATP ratio, which is the most commonly measured parameter in cardiac 31 P-MRS. However, the PCr/ATP ratio is typically used under the assumption that [PCr] changes while [ATP] remains constant. This assumption is reasonable in the healthy heart, but at high workloads [ATP] decreases and so using the PCr/ATP ratio may obscure changes in ATP concentration. 1 In future work, methods such as absolute quantitation (ie the concentrations of metabolites are calibrated to recognized units, eg mol/L) would overcome this limitation. 28 The aim of this work was to assess the reproducibility of cardiac PCr/ATP at 7 T in the healthy population. At 7 T the transmit field strength B 1 + -and therefore the flip angle achieved in the myocardium-depends strongly on coil loading. Reproducibility may therefore be different in patient groups whose body shape is different from that of the healthy volunteers in this study (eg obese Type 2 diabetes mellitus patients).

| CONCLUSION
We report reproducibility values for human cardiac 31 P-MRS at 7 T. These provide the necessary information to design future clinical studies using 7 T 31 P-MRS as an endpoint. We evaluated two protocols, one 28 min CSI protocol designed to acquire the best quality spectra in a clinically feasible scan time, and one 6½ min CSI protocol that gives spectra of a quality previously reported at 3 T, 15 but in a time short enough for use in stress-response studies. Per-subject B 0 shimming improved spectral quality, but had a negligible impact on measurement reproducibility.