Vector casting for noise reduction
Abstract
We report a new method for the reduction of noise from spectra. This method is based on casting vectors from one data point to the following data points of the noisy spectrum. The noise-reduced spectrum is computed from the casted vectors within a margin that is identified by an envelope-finder algorithm. We compared here the presented method with the Savitzky–Golay and the wavelet transform approaches for noise reduction using simulated Raman spectra of various signal-to-noise ratios between 1 and 25 dB and experimentally acquired Raman spectra. The method presented here performs well compared with the Savitzky–Golay and the wavelets-based denoising method, especially at small signal-to-noise ratios and furthermore relies on a minimum of human input requirements.
1 INTRODUCTION
Spectral analysis involves processing of spectroscopic data or patterns for quantification and/or identification of samples or processes.1-7 The spectroscopic raw data usually contain contributions originating from the desired signal itself, the noise and from the background or interferences from undesired signals.8 One of the first processing steps (often the first one) of spectroscopic raw data is the elimination of the noise or the reduction of the noise level. This is especially challenging when the signal-to-noise ratio (SNR) is small, meaning when the differentiation between noise and signal based on solely intensity or peak height is not straightforward. Existing noise reduction algorithms can reduce the noise level on the one hand, but on the other hand—especially at small SNRs—can also manipulate and with this falsify the desired signal contribution.
The origin of small SNRs can be manifold. Small available or realizable excitation powers9 in combination with small interaction probabilities (cross sections) between the excitation and the matter under investigation often result in small SNRs.10-12 Also, a less efficient signal detection or short acceptable signal integration time can result in small SNRs.13, 14 On the contrary, even the long integration of low signal levels can lead to small SNRs, as together with the signal also the thermal background together with its thermal noise is accumulated. The ineluctable contamination of spectroscopic data with noise therefore limits the performance of spectroscopic techniques.15-17
Many postprocessing techniques have been used to denoise spectroscopic data, such as the Savitzky–Golay (SG) filter,18, 19 smoothing based on the wavelet transform method,20-22 the “perfect smoother” method,23 the finite impulse response (FIR) smoother,24 and smoothing based on the “Wiener estimation.”25 The SG smoother is the most popular and frequently used method to denoise spectroscopic data.8, 25, 26 It is based on the least-squares fitting of polynomials of specified order to connected data points contained in a moving window of specified size. The larger the size of the window is chosen, meaning the more data points are considered for the polynomial fit, the more the raw spectral data are smoothed.24 Not only the noise but also the sharp signal features can potentially be smoothed out, like it is the case for all smoothing algorithms. Thus, a compromise needs to be made between smoothing out the noise and a loss of spectral information by carefully adjusting the window size and polynomial order of the filter.
Smoothing based on wavelets is simple to use, while adapting well to the form of the signal being smoothed.22, 27 Here, the noisy raw spectral data are transformed into a wavelet domain by decomposing it into a set of orthonormal wavelet basis functions. The major signal trends of the spectrum are assignable to large wavelet coefficients, whereas the noise is assignable to only small coefficients.20, 28 Hereupon, the noise is suppressed by thresholding the wavelet coefficients. Then the not-suppressed coefficients are reverse transformed to obtain the noise-reduced spectra. However, the selection of the wavelet basis functions and the threshold value have a great impact on the performance of the method and are strongly problem dependent.25 Moreover, the application of this approach to spectra with small SNR can reduce, remove, or manipulate also signal contributions.29
Člupek et al.24 tested the FIR smoother to suppress noise in spectra. They reported that this technique offers better preservation of the real signal contribution compared with the SG smoother. However, it is demanding in computation.24, 25 Using the “Wiener estimation,” Chen et al.25 developed a method on the basis of spectral reconstruction to recover spectra with small SNR. In comparison with other denoising methods such as the SG method, the FIR smoother, and the wavelet transform method, their method showed excellent performance. However, a calibration data set that relies on input spectra with large SNRs is required for the successful denoising of spectra with small SNR.
We here introduce a vector casting method for noise reduction. We compared its performance with the frequently used SG and wavelets denoising methods. The performance comparison considers the extractability of the real signal contribution. To the best of our knowledge, vector casting has never been applied to denoise spectra.
2 MATERIAL AND METHODS
2.1 Samples
We used two sets of samples to validate the vector casting method. The two sets comprise simulated Raman spectra and experimentally acquired Raman spectra. At this point, it should be underlined that the vector casting method is not limited to the treatment of Raman spectra. Therefore, the descriptions provided in the sections that follow are provided in a general context and can be transferred to any kind of spectral data. We only consider contributions to the spectroscopic data coming from the real signal and from the noise. We neglect the potentially occurring contributions of a background, as the background is usually subtracted from the spectroscopic data using baseline correction methods.11, 30, 31 These baseline correction methods can still be applied after the noise reduction method.


Figure 1 shows the simulated spectrum Ssig(xi). The number seven of Lorentzian peaks and the parameters of these peaks were chosen to imitate overlapping peaks (xn = 840), narrow peaks (xn = 848,xn = 900), small peaks (xn = 830), and broad peaks (xn = 820,xn = 860).






For the acquisition of the experimental Raman spectra,11 we used as excitation source a diode laser (Toptica DLpro) emitting 785-nm radiation and a spectrometer (Ventana from Ocean Optics) for signal detection between 800 and 940 nm, which corresponds to Raman shifts between 200 and 2,000 cm−1. With an excitation laser power of 10 mW, we collected Raman spectra of liquid ethanol at various integration times between 20 and 1,000 ms. From the different integration times, experimental spectra R(xi) with various SNRs resulted. Also, the experimentally acquired spectra are composed of a signal and a noise contribution. Additionally, a quasi-noise-free (low-noise) Raman spectrum of ethanol was acquired with an excitation power of 300 mW and 1,000 ms of signal integration time. This quasi-noise-free spectrum can be considered as a reference spectrum or as a quasi-pure signal spectrum Ssig(xi). We chose ethanol for the acquisition of the experimental spectra, as the Raman spectrum of ethanol also shows narrow, broad, and overlapping peaks.




3 RESULTS AND DISCUSSION
The vector casting method requires preprocessing of the raw spectra. In the first step, the top and bottom envelopes of the noisy spectra have to be identified using an envelope-finder algorithm that is described in detail below. Afterwards, the vectors are casted within the margin of the before identified envelopes for the derivation of the noise-reduced spectrum r(xi). We want to emphasize here that already the envelope-finder algorithm alone provides a significant noise reduction.
3.1 Envelope-finder algorithm







In the second level (Level 2), data points of p(xi) and v(xi) are searched for peaks and valleys by forward and backward differentiation. Figure 2 middle shows these computed peak and valley data points of the peaks and valleys obtained in Level 1. The notations pp(xi) and vp(xi) indicate the peaks of p(xi) and valleys of p(xi), respectively. Similarly, pv(xi) and vv(xi) means, respectively, peaks of v(xi) and valleys of v(xi). Computing the peaks and valleys recursively, in a third level, the peaks and valley of pp(xi), vp(xi), pv(xi), and vp(xi) can be computed by forward and backward differentiation as well. Figure 2 bottom shows the Level 3 valleys of the Level 2 valleys of the Level 1 peaks, which is referred to as vvp(xi). It also can be seen that the first two red diamonds (vvp(xi)) in Figure 2 bottom can be considered as left and right border of a signal peak.
- The maximum value of the difference

- The height Ph of a potential signal peak which in Figure 3b is shown as a blue line

- The window is classified as a peak region wp if













In Equation 17, the central peak
or valley
data points are updated by averaging all the peak Pi or the valley Vi data points within wm, respectively. Contrary, Equation 18 updates the central peak
or valley
data points by averaging peak Pi or valley Vi data points, which fulfill a condition |Pi − Citop| ≤ ΔImax or |Vi − Cibottom| ≤ ΔImax. This condition makes sure that only peak Pi or valley Vi data points that are not far from
or valley
are considered to update
or valley
, respectively.
By linear interpolation between all updated peak
and valley
data points for all variables xi that according to Equations 11 and 12 have neither been assigned to a valley nor a peak point, the noise-reduced top envelope Etop(xi) and bottom envelope Ebottom(xi) are generated. Figure 4 shows both of them computed for a moving window with the size n = 9. Scheme 1 presents the flow chart of the envelope-finder algorithm where m is total number of peak/valley data points of the noisy spectrum.



3.2 Vector casting based smoothing



Second, all vectors
that cross either the top or the bottom envelope are deleted from the set of vectors
. Deleted vectors are highlighted in red in Figure 5, whereas remaining vectors are highlighted in green.



Figure 5a,b shows the noise reduction due to the vector casting method in a spectral region that does not contain a signal peak and in a spectral region that does contain a signal peak respectively. Figure 5a (zoomed plot) shows the details of the computation of the next noise-reduced data point starting from the previous one and Figure 5c shows as solid black line the computed noise-reduced spectrum r(xi).
In Figure 5, vectors are not casted from the previously noise-reduced data point to all of the subsequent data points but only to subsequent data points contained in a certain window wvector. Casting the vectors not to all subsequent data points but only to data points contained in a certain window reduces the computation demand significantly. In Figure 5, the size of the window wvector in which the vectors are casted is M = 150, meaning, that vectors are casted to the subsequent 150 data points. Scheme 2 shows the flow chart of the vector casting method.

3.3 Parameter tuning effect
The algorithms outlined in the previous section requires two input parameters: the size n of the moving window wm and the size M of the window in which vectors are casted wvector. In order to investigate the effect of these parameters, we applied the vector casting method at different values of n and M. In Figure 6, we showed the results at n = 1,5,9,11 keeping M = 150.

Increasing n initially from n = 1 to n = 5 improves the smoothness of the noisy signal especially for small SNR (peak-free region Figure 6). However, the vector casting method is rather insensitive to further increase of the size of the smoothing window from n = 9 to n = 11. The peak regions are also less sensitive to the change in n as compared with the peak-free regions because the data points for averaging are determined automatically (Equation 18) where only small number of nearby data points are involved.
The effect of varying the number of vectors to be casted is shown in Figure 7 and was tested by setting M = 50,100,and 150 keeping n = 9. Compared with the mean of envelopes (black line in Figure 7), casting vectors show significant improvement. However, increasing M further than M = 50 did not show significant improvement as the noise-reduced spectra look rather similar. This can be justified by the circumstance that the larger the distance between xi and xk is, the less is the probability of the corresponding vector to be included in the computation of the new noise-reduced data point r(xk+1) in Equation 24.

3.4 Comparison with Savitzky–Golay and wavelet transform smoothing techniques
Figure 8 shows the simulated signal spectrum as solid black line and as grey simulated raw spectra with noise levels between 1 and 25 dB. At each SNR, 10 samples were simulated. The raw spectra are noise reduced using the presented vector casting method, the presented envelope-finder algorithm, the SG method, and the wavelet transform method. For the SG and the wavelet transform method, the input parameters were optimized with respect to a maximum overall SNR performance between the obtained noise-reduced spectrum and the pure signal spectrum according to Equation 8d. Figure 9a,b shows the parameters selected to give optimal denoised spectra for the wavelets and SG methods, respectively.


With respect to the SG method, the window size was varied from three to the maximum odd number that was smaller than or equal to the number of data points of the spectrum, and the polynomial order was varied between one and nine. During denoising of the simulated noisy signals, as it can be seen in Figure 9b, polynomial order of three and window size of nine were more frequently selected.
With respect to the wavelet transform method, a wavelet denoising function (wdenoise) using the software package “Wavelet Toolbox” in MATLAB (by MathWorks Inc.) was used. Improved implementation versions of the wavelet denoising technique40, 41 can exist; however, the relevant codes are not available and thus could not be applied. Therefore, using the wavelets denoising built in MATLAB, we varied the level of decomposition between 1 and 10. Four different threshold selection rules42 were tested. For the selection of the suppression coefficients, mean, median, soft and hard thresholding 43 approaches were evaluated. Moreover, two different wavelet families (symlets and Daubechies) were tested. Figure 9a shows the frequency of usage of these parameters while optimally denoising the simulated noisy signals with wavelets method. With respect to the envelope-finder approach, we used a size of the moving window of n = 9, and for the vector casting method, we casted the vectors in a window containing 150 data points.
Figure 10 shows the SNR achievement of the four denoising methods computed using Equations 8a, 8c, and 8d. As it can be seen in Figure 10a,b, all the denosing methods improved the original SNR across the entire spectral region as well as at sharp peaks. The vector casting and wavelet methods perform better as compared with the other two methods. The vector casting method performs better than wavelet method at lower SNR, whereas the wavelet method exceeds the performance of vector casting method at higher SNRs. Figure 10c depicts the overall performance of the denoising methods in smoothing the noisy signal while at the same time keeping the spectral peaks undistorted. For noisy signal with SNR up to 15 dB, the vector casting method performs better followed by the mean envelopes. For higher SNRs, the wavelets method exceeded the performance of the here proposed methods.

Figure 11 shows the simulated raw spectrum R(xi) with a SNR = 10 dB as grey line, the pure signal spectrum Ssig(xi) as blue line, and the denoised spectra r(xi) of vector casting, mean envelope, Savitzky–Golay, and wavelets as green, black, magenta, and red lines, respectively. From the comparison of the pure signal spectrum and denoised signal spectra information can be extracted about the performance of the different noise reduction methods with respect to the level of smoothing at peak-free regions and preservation of spectral shapes at peak regions (zoomed figures in Figure 10). With the vector casting method (green line), the noise-reduced spectrum shows excellent match to the peak locations of the signal spectrum and preserves the spectral shape information rather well. And the standard deviation of the denoised spectrum is very small compared with the other methods specially at peak-free regions. Denoising applying solely the envelope-finder algorithm (black line) provides an overall noisier noise-reduced spectrum than the vector casting method. Still the peak heights and spectral shape are preserved rather well. The noise-reduced spectrum obtained using the optimized Savitzky–Golay method (magenta) shows a noisier spectrum and that the spectral peak shape information is manipulated compared with the pure signal spectrum. The wavelet transform method (red line) shows a better performance than the Savitzky–Golay method. Nonetheless, the peak positions and the peak shapes are manipulated slightly.

Finally, the performance of the four denoising approaches is compared based on experimentally acquired Raman spectra. Figure 12 shows 14 experimental Raman spectra of ethanol (grey lines) featuring different noise levels. A quasi-pure ethanol signal spectrum (black line) with large SNR is also shown as quasi-pure signal spectrum Ssig(xi).

Figure 13a–d shows each 14 Raman spectra (grey lines) of ethanol, normalized to the highest peak at around 845 nm, recovered using vector casting, mean of envelopes, wavelet, and Savitzky–Golay smoothing methods, respectively. The spectrum of ethanol with high SNR is also shown in blue line in the figures for reference. Moreover, to assess the reproducibility of the recovered spectra, the standard deviation of the 14 recovered noise-reduced spectra at each variable (here Raman shift) is computed and depicted alongside the recovered spectra as a red line. The standard deviation is quantified on the right ordinate. As it can be seen from Figure 13, the standard deviation is higher around the peak regions than at peak-free regions. Thus, all techniques affect the peak to some extent. However, with the vector casting method, a better reproducibility of the spectra was obtained. In every peak, the vector casting method achieved the minimum standard deviation. The mean of the envelopes also shows a comparable result with the wavelets method. The standard deviation of the Raman peak at around 812 nm is 0.013, 0.021, and 0.052 for the vector casting, wavelets, and Savitzky–Golay methods, respectively. The peak broadening effect of the Savitzky–Golay technique is highly reflected by the standard deviation of the peak at around 842 nm. Moreover, the standard deviation of the double Raman peaks at 855 nm, shoulder peak at 870 nm, and Raman peak at 884 nm is decreased from 0.07 and 0.025 to 0.021, from 0.04 and 0.019 to 0.016, and from 0.07 and 0.024 to 0.02 with respect to Savitzky–Golay and wavelet methods, respectively.

Next to the circumstance that the here proposed new method for the denoising of raw spectra outperforms the two most frequently used methods, it has to be mentioned that the newly proposed method also involves a minimum of human interaction. In contrast, our method requires envelope detection that involves peak detection. We also compared the proposed algorithm in terms of computational efficiency. The language used for the implementation was Python.44 The average time taken for the envelope-finder algorithm was comparable with SG on a Dell Latitude E7450 with an Intel Core i7 processor. However, the vector casting method took longer execution time, and the average execution time depends on the number of vectors to be casted.
4 CONCLUSION
In this study, we developed a new method for the processing of spectra that are relevant for the purification of spectral signal from spectra with small SNR. Of course, this technique cannot extract signal peaks that are smaller than the noise level, but it can remove noise, although manipulating the characteristics of the pure signal less than the wavelet transform method or the SG method. Furthermore, the proposed method does only to a minimum extent rely on input parameters that have to be chosen by humans. Summarizing, the proposed method should be considered reliable, robust, and accurate.
ACKNOWLEDGEMENTS
The project leading to this result has received funding from the Wilhelm Sander-Stiftung, Munich, Germany (Grant 2017.111.1). It also has received funding from the European Union's Horizon 2020 research and innovation programme under ERC Starting Grant agreement 637654 (Inhomogeneities).