Volume 51, Issue 4 p. 731-743
RESEARCH ARTICLE
Open Access

Vector casting for noise reduction

Medhanie Tesfay Gebrekidan

Medhanie Tesfay Gebrekidan

Institute of Thermal, Environmental, and Resources' Process Engineering (ITUN), Technische Universität Bergakademie Freiberg (TUBAF), Freiberg, Germany

Erlangen Graduate School in Advanced Optical Technologies (SAOT), Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany

Search for more papers by this author
Christian Knipfer

Christian Knipfer

Erlangen Graduate School in Advanced Optical Technologies (SAOT), Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany

Department of Oral and Maxillofacial Surgery, University Medical Center Hamburg-Eppendorf, Hamburg, Germany

Search for more papers by this author
Andreas Siegfried Braeuer

Corresponding Author

Andreas Siegfried Braeuer

Institute of Thermal, Environmental, and Resources' Process Engineering (ITUN), Technische Universität Bergakademie Freiberg (TUBAF), Freiberg, Germany

Zentrum für effiziente Hochtemperatur-Stoffwandlung (ZeHS), Technische Universität Bergakademie Freiberg (TUBAF), Freiberg, Germany

Correspondence

Andreas Siegfried Braeuer, Institute of Thermal, Environmental, and Resources' Process Engineering (ITUN), Technische Universität Bergakademie Freiberg (TUBAF), Leipziger Straße 28, 09599 Freiberg, Germany.

Email: [email protected]

Search for more papers by this author
First published: 23 January 2020
Citations: 9

Abstract

We report a new method for the reduction of noise from spectra. This method is based on casting vectors from one data point to the following data points of the noisy spectrum. The noise-reduced spectrum is computed from the casted vectors within a margin that is identified by an envelope-finder algorithm. We compared here the presented method with the Savitzky–Golay and the wavelet transform approaches for noise reduction using simulated Raman spectra of various signal-to-noise ratios between 1 and 25 dB and experimentally acquired Raman spectra. The method presented here performs well compared with the Savitzky–Golay and the wavelets-based denoising method, especially at small signal-to-noise ratios and furthermore relies on a minimum of human input requirements.

1 INTRODUCTION

Spectral analysis involves processing of spectroscopic data or patterns for quantification and/or identification of samples or processes.1-7 The spectroscopic raw data usually contain contributions originating from the desired signal itself, the noise and from the background or interferences from undesired signals.8 One of the first processing steps (often the first one) of spectroscopic raw data is the elimination of the noise or the reduction of the noise level. This is especially challenging when the signal-to-noise ratio (SNR) is small, meaning when the differentiation between noise and signal based on solely intensity or peak height is not straightforward. Existing noise reduction algorithms can reduce the noise level on the one hand, but on the other hand—especially at small SNRs—can also manipulate and with this falsify the desired signal contribution.

The origin of small SNRs can be manifold. Small available or realizable excitation powers9 in combination with small interaction probabilities (cross sections) between the excitation and the matter under investigation often result in small SNRs.10-12 Also, a less efficient signal detection or short acceptable signal integration time can result in small SNRs.13, 14 On the contrary, even the long integration of low signal levels can lead to small SNRs, as together with the signal also the thermal background together with its thermal noise is accumulated. The ineluctable contamination of spectroscopic data with noise therefore limits the performance of spectroscopic techniques.15-17

Many postprocessing techniques have been used to denoise spectroscopic data, such as the Savitzky–Golay (SG) filter,18, 19 smoothing based on the wavelet transform method,20-22 the “perfect smoother” method,23 the finite impulse response (FIR) smoother,24 and smoothing based on the “Wiener estimation.”25 The SG smoother is the most popular and frequently used method to denoise spectroscopic data.8, 25, 26 It is based on the least-squares fitting of polynomials of specified order to connected data points contained in a moving window of specified size. The larger the size of the window is chosen, meaning the more data points are considered for the polynomial fit, the more the raw spectral data are smoothed.24 Not only the noise but also the sharp signal features can potentially be smoothed out, like it is the case for all smoothing algorithms. Thus, a compromise needs to be made between smoothing out the noise and a loss of spectral information by carefully adjusting the window size and polynomial order of the filter.

Smoothing based on wavelets is simple to use, while adapting well to the form of the signal being smoothed.22, 27 Here, the noisy raw spectral data are transformed into a wavelet domain by decomposing it into a set of orthonormal wavelet basis functions. The major signal trends of the spectrum are assignable to large wavelet coefficients, whereas the noise is assignable to only small coefficients.20, 28 Hereupon, the noise is suppressed by thresholding the wavelet coefficients. Then the not-suppressed coefficients are reverse transformed to obtain the noise-reduced spectra. However, the selection of the wavelet basis functions and the threshold value have a great impact on the performance of the method and are strongly problem dependent.25 Moreover, the application of this approach to spectra with small SNR can reduce, remove, or manipulate also signal contributions.29

Člupek et al.24 tested the FIR smoother to suppress noise in spectra. They reported that this technique offers better preservation of the real signal contribution compared with the SG smoother. However, it is demanding in computation.24, 25 Using the “Wiener estimation,” Chen et al.25 developed a method on the basis of spectral reconstruction to recover spectra with small SNR. In comparison with other denoising methods such as the SG method, the FIR smoother, and the wavelet transform method, their method showed excellent performance. However, a calibration data set that relies on input spectra with large SNRs is required for the successful denoising of spectra with small SNR.

We here introduce a vector casting method for noise reduction. We compared its performance with the frequently used SG and wavelets denoising methods. The performance comparison considers the extractability of the real signal contribution. To the best of our knowledge, vector casting has never been applied to denoise spectra.

2 MATERIAL AND METHODS

2.1 Samples

We used two sets of samples to validate the vector casting method. The two sets comprise simulated Raman spectra and experimentally acquired Raman spectra. At this point, it should be underlined that the vector casting method is not limited to the treatment of Raman spectra. Therefore, the descriptions provided in the sections that follow are provided in a general context and can be transferred to any kind of spectral data. We only consider contributions to the spectroscopic data coming from the real signal and from the noise. We neglect the potentially occurring contributions of a background, as the background is usually subtracted from the spectroscopic data using baseline correction methods.11, 30, 31 These baseline correction methods can still be applied after the noise reduction method.

The simulated spectra
urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0001(1)
are the summation of a pure signal spectrum Ssig(xi) and a noise spectrum Nsim(xi), where xi is the variable (Raman shift in the case of Raman spectra, wavelength in the case of fluorescence spectra, wavenumber in the case of absorption spectra, temperature in the case of differential scanning calorimetry spectra, theta in the case of X-ray diffraction spectra, etc.). Due to Doppler broadening, collisional broadening and optical effects spectral lines, peaks, or bands are never strictly monochromatic but feature a distribution around their centre.32 Spectral signal profiles can be fitted by Lorentzian, Gaussian, or Voigt profiles.33
The simulated signal spectrum
urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0002(2)
is the summation of seven Lorentzian peaks having different amplitudes An, widths σn, and being centered at different variables xn. We chose Lorentzian peaks as they best reflect theoretical Raman signal lines.34 The usage of other peak shapes or a different number of peaks would not influence the vector casting method.

Figure 1 shows the simulated spectrum Ssig(xi). The number seven of Lorentzian peaks and the parameters of these peaks were chosen to imitate overlapping peaks (xn = 840), narrow peaks (xn = 848,xn = 900), small peaks (xn = 830), and broad peaks (xn = 820,xn = 860).

Details are in the caption following the image
Simulated spectrum (intensity as a function of a variable xi) consisting of seven Lorentzian peaks [Colour figure can be viewed at wileyonlinelibrary.com]
Noise in spectroscopic data acquired with a nonintensified charge coupled device consists of Poisson noise (shot and thermal noise) and Gaussian noise (readout noise). However, above certain noise levels, Gaussian noise is a good approximation for Poisson noise.21, 35 The sum of a mutually independent zero-mean Gaussian noise is still a zero-mean Gaussian noise with variance equal to the sum of the variances of the independent Gaussian noises.36 Thus, spectroscopic noise can be modelled as the summation
urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0003(3)
of shot noise nph (also referred to as photon noise), thermal noise nth, and readout noise nrd.35, 37 e(xi) is Gaussian noise having a standard deviation of one and mean of zero.36, 38 The shot noise
urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0004(4)
is the square root of the signal Ssig(xi) and with this is a function of the variable xi. Variables xi with large signal feature a large shot noise, whereas variables without signal feature no shot noise.
The thermal noise
urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0005(5)
is approximated by the square root of the thermal background B. The thermal background is supposed to be a constant over xi. Also, the read out noise is considered as a constant c over xi.
urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0006(6)
The SNR in decibel is computed according to
urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0007(7)
from the integral of the squared signal and the integral of the squared noise.39 The SNR can be changed by changing the constant B in Equation 5, the constant c in Equation 6, or by scaling the signal Ssig(xi) in Equation 4.

For the acquisition of the experimental Raman spectra,11 we used as excitation source a diode laser (Toptica DLpro) emitting 785-nm radiation and a spectrometer (Ventana from Ocean Optics) for signal detection between 800 and 940 nm, which corresponds to Raman shifts between 200 and 2,000 cm−1. With an excitation laser power of 10 mW, we collected Raman spectra of liquid ethanol at various integration times between 20 and 1,000 ms. From the different integration times, experimental spectra R(xi) with various SNRs resulted. Also, the experimentally acquired spectra are composed of a signal and a noise contribution. Additionally, a quasi-noise-free (low-noise) Raman spectrum of ethanol was acquired with an excitation power of 300 mW and 1,000 ms of signal integration time. This quasi-noise-free spectrum can be considered as a reference spectrum or as a quasi-pure signal spectrum Ssig(xi). We chose ethanol for the acquisition of the experimental spectra, as the Raman spectrum of ethanol also shows narrow, broad, and overlapping peaks.

The noise-reduced spectrum r(xi) results after noise reduction from either the simulated or the experimental spectra R(xi) with either the vector casting method, the envelope-finder method, the SG method, or the wavelet transform method. In order to compare the performance of the different noise reduction methods, we quantified the deviation between the real signal spectrum Ssig(xi) and the noise-reduced spectra r(xi) derived with the different methods according to Equations 8a to 8d as proposed by Barton et al.35 These equations quantify (a) the mean improvement of the signal quality across the entire spectral range (Equation 8a), (b) monitor whether or not the algorithm interacted negatively with the spectral peaks (Equation 8c), and (c) quantify the signal-to-noise improvement (Equation 8d) relative to SNR of the original noisy spectrum (R(xi)).
urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0008(8a)
urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0009(8b)
urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0010(8c)
urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0011(8d)

3 RESULTS AND DISCUSSION

The vector casting method requires preprocessing of the raw spectra. In the first step, the top and bottom envelopes of the noisy spectra have to be identified using an envelope-finder algorithm that is described in detail below. Afterwards, the vectors are casted within the margin of the before identified envelopes for the derivation of the noise-reduced spectrum r(xi). We want to emphasize here that already the envelope-finder algorithm alone provides a significant noise reduction.

3.1 Envelope-finder algorithm

The envelope-finder algorithm aims at identifying a smooth top Etop(xi) and a bottom Ebottom(xi) envelope of the noisy spectrum R(xi). In the first level (Level 1), all data points of R(xi) are classified as either peak or valley, irrespectively of whether the peak is due to noise or due to a real signal. On this account, a forward urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0012 and a backward urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0013 differentiation is made.
urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0014(9)
urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0015(10)
A data point is a peak p(xi) or a valley v(xi) if its forward and backward differentiation are both positive or negative, respectively.
urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0016(11)
urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0017(12)
Figure 2 top shows the Level 1 peaks and valleys.
Details are in the caption following the image
Illustration of cascaded local Extrema search at level 1, level 2, and level 3 [Colour figure can be viewed at wileyonlinelibrary.com]

In the second level (Level 2), data points of p(xi) and v(xi) are searched for peaks and valleys by forward and backward differentiation. Figure 2 middle shows these computed peak and valley data points of the peaks and valleys obtained in Level 1. The notations pp(xi) and vp(xi) indicate the peaks of p(xi) and valleys of p(xi), respectively. Similarly, pv(xi) and vv(xi) means, respectively, peaks of v(xi) and valleys of v(xi). Computing the peaks and valleys recursively, in a third level, the peaks and valley of pp(xi), vp(xi), pv(xi), and vp(xi) can be computed by forward and backward differentiation as well. Figure 2 bottom shows the Level 3 valleys of the Level 2 valleys of the Level 1 peaks, which is referred to as vvp(xi). It also can be seen that the first two red diamonds (vvp(xi)) in Figure 2 bottom can be considered as left and right border of a signal peak.

Figure 3a again shows that these vvp(xi) data points (red diamonds) are suitable as indicators for the left and the right border of potential peaks. Figure 3b,c shows upon which criteria a decision is made whether or not a signal peak is situated between two vvp(xi) data points, here the left border (lb) and the right border (rb) diamond. The procedure for the classification of peak and peak-free regions is as follows:
  1. The maximum value of the difference
urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0018(13)
between two consecutive vvp(xi) data points (lb and rb) is computed. The window size w is defined automatically by the vvp(xi) and does not have to be chosen by the user. In Figure 3b, the maximum difference ΔImax is shown as green line and can be defined as the maximum deviation between consecutive data points within the window.
  1. The height Ph of a potential signal peak which in Figure 3b is shown as a blue line
urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0019(14)
is computed as the maximum difference between a data point and the baseline within the window. The baseline is the direct connection between the two vvp(xi) data points.
  1. The window is classified as a peak region wp if
urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0020(15)
and if according to Figure 3c, the slopes of the linear fits of p(xi) and v(xi) left of the maximum of the potential peak are both positive and right of the potential peak are both negative.
urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0021(16)
Otherwise, the window is classified as peak-free wpf region.
Details are in the caption following the image
(a) Simulated noisy spectrum (grey line) and detected level 3 minima of maxima of R(xi) (red filled diamonds). (b) a window margined between left border (lb) and right border (rb) showing the candidate peak (Pc) as brown point, maximum difference between consecutive data points in green line and the peak height (Ph) in blue line. (c) Illustration of how linear regression lines are fitted to the level one peak and valley data points within the window margined by lb and rb [Colour figure can be viewed at wileyonlinelibrary.com]
The Level 1 peak p(xi) and valley v(xi) (Figure 2 top) data points are a good first estimate of the top and bottom envelopes of the noisy signal R(xi). For smoothing these envelopes p(xi) and v(xi), we considered a moving window wm consisting of 2n+1 peak (Pi) or valley (Vi) data points. Here, the window size has to be set manually. Later, the influence of the chosen moving window size wm onto the noise reduction performance will be discussed. Here, we assigned Pi to the data points of p(xi) confined within wm, and urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0022 is one of the data points of Pi, which is located at center of the wm. Similarly, Vi is assigned to the data points of v(xi) included in wm, and urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0023 is the central data point of Vi. The central peak urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0024 or valley urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0025 data point within the moving window wm is updated by a new value. The updating procedure depends on whether urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0026 or urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0027 is within a peak region wp or within a peak-free region wpf. If they are part of a peak-free region, they are substituted by the moving window wm average.
urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0028(17)
If they are identified as part of peak region wp, they are substituted by
urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0029(18)
the average of all peak/valley data points in the window for which the value of the difference of the central window point urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0030 or urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0031 and the respected peak or valley data point is smaller than or equal to ΔImax, where ΔImax is computed according to Equation 13. j is the number of peak (Pi) or valley (Vi) data points contained within the moving window wm where |PiCitop| ≤ ΔImax or |ViCibottom| ≤ ΔImax.

In Equation 17, the central peak urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0032 or valley urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0033 data points are updated by averaging all the peak Pi or the valley Vi data points within wm, respectively. Contrary, Equation 18 updates the central peak urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0034 or valley urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0035 data points by averaging peak Pi or valley Vi data points, which fulfill a condition |PiCitop| ≤ ΔImax or |ViCibottom| ≤ ΔImax. This condition makes sure that only peak Pi or valley Vi data points that are not far from urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0036 or valley urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0037 are considered to update urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0038 or valley urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0039, respectively.

By linear interpolation between all updated peak urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0040 and valley urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0041 data points for all variables xi that according to Equations 11 and 12 have neither been assigned to a valley nor a peak point, the noise-reduced top envelope Etop(xi) and bottom envelope Ebottom(xi) are generated. Figure 4 shows both of them computed for a moving window with the size n = 9. Scheme 1 presents the flow chart of the envelope-finder algorithm where m is total number of peak/valley data points of the noisy spectrum.

Details are in the caption following the image
Simulated noisy spectrum (grey line), linearly interpolated top envelope (blue line) and bottom envelope (red line), the mean of the top and bottom envelope (black line), and the real signal spectrum as dashed line [Colour figure can be viewed at wileyonlinelibrary.com]
Details are in the caption following the image
Flow chart of envelope-finder algorithm
Figure 4 shows the top and the bottom envelope as blue and red line. The raw spectral data R(xi) are shown as grey line. The real signal contribution Ssig(xi) behind the spectral data is shown as dashed line. The solid black line shows the
urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0042(19)
mean of the top and the bottom envelopes Emean(xi). Apparently, Emean(xi) is already close to Ssig(xi). This indicates that the envelope-finder algorithm alone has a great potential of noise reduction. Nevertheless, the noise level can be even more reduced if in a next step vectors are casted within the margins of the top and the bottom envelopes.

3.2 Vector casting based smoothing

Vectors are casted from a starting already noise-reduced point r(xk) to subsequent not yet noise-reduced data points R(xi > k), as it is illustrated in Figure 5. The starting already noise-reduced point r(xk)
urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0043(20)
is the mean of the bottom and top envelope at xi = 0. From this already noise-reduced point r(xk), vectors
urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0044(21)
can be casted to all subsequent data points available with i > k.
Details are in the caption following the image
The noisy raw spectrum is represented by the grey line. The smooth top and bottom envelope are highlighted in blue and red. The noise-reduced data points are represented by the black solid line. (a) Illustration of the vector casting method in a spectral range that does not contain a signal peak. Details of computation of the denoised spectrum data points, interceptions of selected vector lines (magenta +), and mean of interceptions of selected vector lines (cyan circle). Green vectors remain as they do not intersect the top or the bottom envelope. Red vectors are deleted as they intersect the top or the bottom envelope. (b) Illustration of the vector casting method in a spectral region that does contain a signal peak. (c) Illustration of the complete noise-reduced spectrum in the respective section [Colour figure can be viewed at wileyonlinelibrary.com]

Second, all vectors urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0045 that cross either the top or the bottom envelope are deleted from the set of vectors urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0046. Deleted vectors are highlighted in red in Figure 5, whereas remaining vectors are highlighted in green.

Third, for each of the remaining vectors the slope
urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0047(22)
is computed from which the average slope
urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0048(23)
of all the remaining vectors is computed, where l is the number of remaining vectors.
Fourth, for the not yet noise-reduced data point xi = k+1 that is situated one increment right of the already noise-reduced data point xk, the new noise-reduced value
urn:x-wiley:03770486:media:jrs5835:jrs5835-math-0049(24)
is computed from the average slope of the remaining vectors and the distance between the two neighbouring variables xi = k and xi = k+1. This procedure is repeated as long as all raw data points R(xi) have been replaced by noise-reduced data points r(xi).

Figure 5a,b shows the noise reduction due to the vector casting method in a spectral region that does not contain a signal peak and in a spectral region that does contain a signal peak respectively. Figure 5a (zoomed plot) shows the details of the computation of the next noise-reduced data point starting from the previous one and Figure 5c shows as solid black line the computed noise-reduced spectrum r(xi).

In Figure 5, vectors are not casted from the previously noise-reduced data point to all of the subsequent data points but only to subsequent data points contained in a certain window wvector. Casting the vectors not to all subsequent data points but only to data points contained in a certain window reduces the computation demand significantly. In Figure 5, the size of the window wvector in which the vectors are casted is M = 150, meaning, that vectors are casted to the subsequent 150 data points. Scheme 2 shows the flow chart of the vector casting method.

Details are in the caption following the image
Flow chart of vector casting method for denoising of noisy spectrum

3.3 Parameter tuning effect

The algorithms outlined in the previous section requires two input parameters: the size n of the moving window wm and the size M of the window in which vectors are casted wvector. In order to investigate the effect of these parameters, we applied the vector casting method at different values of n and M. In Figure 6, we showed the results at n = 1,5,9,11 keeping M = 150.

Details are in the caption following the image
Vector casting method applied at different smoothing window size of n = 1 (black solid line), n = 5 (red solid line), n = 9 (green solid line), and n = 11 (magenta solid line). Grey solid line shows the simulated noisy spectra, and blue solid line shows pure reference signal [Colour figure can be viewed at wileyonlinelibrary.com]

Increasing n initially from n = 1 to n = 5 improves the smoothness of the noisy signal especially for small SNR (peak-free region Figure 6). However, the vector casting method is rather insensitive to further increase of the size of the smoothing window from n = 9 to n = 11. The peak regions are also less sensitive to the change in n as compared with the peak-free regions because the data points for averaging are determined automatically (Equation 18) where only small number of nearby data points are involved.

The effect of varying the number of vectors to be casted is shown in Figure 7 and was tested by setting M = 50,100,and 150 keeping n = 9. Compared with the mean of envelopes (black line in Figure 7), casting vectors show significant improvement. However, increasing M further than M = 50 did not show significant improvement as the noise-reduced spectra look rather similar. This can be justified by the circumstance that the larger the distance between xi and xk is, the less is the probability of the corresponding vector to be included in the computation of the new noise-reduced data point r(xk+1) in Equation 24.

Details are in the caption following the image
Vector casting applied at different values of number of casted vectors M = 50 (red solid line), M = 100 (green solid line), and M = 150 (magenta solid line). The mean of the envelopes is shown in black solid line. Grey solid line shows the simulated noisy spectra, and blue solid line shows pure reference signal [Colour figure can be viewed at wileyonlinelibrary.com]

3.4 Comparison with Savitzky–Golay and wavelet transform smoothing techniques

Figure 8 shows the simulated signal spectrum as solid black line and as grey simulated raw spectra with noise levels between 1 and 25 dB. At each SNR, 10 samples were simulated. The raw spectra are noise reduced using the presented vector casting method, the presented envelope-finder algorithm, the SG method, and the wavelet transform method. For the SG and the wavelet transform method, the input parameters were optimized with respect to a maximum overall SNR performance between the obtained noise-reduced spectrum and the pure signal spectrum according to Equation 8d. Figure 9a,b shows the parameters selected to give optimal denoised spectra for the wavelets and SG methods, respectively.

Details are in the caption following the image
Synthetic noisy spectra (grey lines) simulated at different SNR levels (1, 2, ..., 20, 25 dB) and reference spectrum (black line)
Details are in the caption following the image
(a) Parameters of wavelet based denoising selected in an iterative loop for an optimal denoised signal (i) the wavelet decomposition levels. (ii) the denoising method (threshold selection rules), empirical Bayes (Bayes), and false discovery rate (FDR) were more frequently selected. (iii) wavelet families with different orders, symlets, and Daubechies. Symlet family order 4 (sym4) were selected more frequently. (iv) threshold rules. (b) Parameters (polynomial order and window size) of SG smoothing selected for its optimal performance [Colour figure can be viewed at wileyonlinelibrary.com]

With respect to the SG method, the window size was varied from three to the maximum odd number that was smaller than or equal to the number of data points of the spectrum, and the polynomial order was varied between one and nine. During denoising of the simulated noisy signals, as it can be seen in Figure 9b, polynomial order of three and window size of nine were more frequently selected.

With respect to the wavelet transform method, a wavelet denoising function (wdenoise) using the software package “Wavelet Toolbox” in MATLAB (by MathWorks Inc.) was used. Improved implementation versions of the wavelet denoising technique40, 41 can exist; however, the relevant codes are not available and thus could not be applied. Therefore, using the wavelets denoising built in MATLAB, we varied the level of decomposition between 1 and 10. Four different threshold selection rules42 were tested. For the selection of the suppression coefficients, mean, median, soft and hard thresholding 43 approaches were evaluated. Moreover, two different wavelet families (symlets and Daubechies) were tested. Figure 9a shows the frequency of usage of these parameters while optimally denoising the simulated noisy signals with wavelets method. With respect to the envelope-finder approach, we used a size of the moving window of n = 9, and for the vector casting method, we casted the vectors in a window containing 150 data points.

Figure 10 shows the SNR achievement of the four denoising methods computed using Equations 8a, 8c, and 8d. As it can be seen in Figure 10a,b, all the denosing methods improved the original SNR across the entire spectral region as well as at sharp peaks. The vector casting and wavelet methods perform better as compared with the other two methods. The vector casting method performs better than wavelet method at lower SNR, whereas the wavelet method exceeds the performance of vector casting method at higher SNRs. Figure 10c depicts the overall performance of the denoising methods in smoothing the noisy signal while at the same time keeping the spectral peaks undistorted. For noisy signal with SNR up to 15 dB, the vector casting method performs better followed by the mean envelopes. For higher SNRs, the wavelets method exceeded the performance of the here proposed methods.

Details are in the caption following the image
Comparison of vector casting (green line) and mean envelope (black line) to Savitzky–Golay (blue line) and wavelets (red line). (a) Global signal-to-noise ratio of denoised signal r(xi) as a function of signal-to-noise ratio of original noisy signal R(xi). (b) Signal-to-noise ratio of peak regions of denoised signal r(xi) as a function of signal-to-noise ratio of peak regions of the noisy signal R(xi). (c) Overall performance of denoising algorithms with respect to overall denoised signal quality and interaction with sharp peaks [Colour figure can be viewed at wileyonlinelibrary.com]

Figure 11 shows the simulated raw spectrum R(xi) with a SNR = 10 dB as grey line, the pure signal spectrum Ssig(xi) as blue line, and the denoised spectra r(xi) of vector casting, mean envelope, Savitzky–Golay, and wavelets as green, black, magenta, and red lines, respectively. From the comparison of the pure signal spectrum and denoised signal spectra information can be extracted about the performance of the different noise reduction methods with respect to the level of smoothing at peak-free regions and preservation of spectral shapes at peak regions (zoomed figures in Figure 10). With the vector casting method (green line), the noise-reduced spectrum shows excellent match to the peak locations of the signal spectrum and preserves the spectral shape information rather well. And the standard deviation of the denoised spectrum is very small compared with the other methods specially at peak-free regions. Denoising applying solely the envelope-finder algorithm (black line) provides an overall noisier noise-reduced spectrum than the vector casting method. Still the peak heights and spectral shape are preserved rather well. The noise-reduced spectrum obtained using the optimized Savitzky–Golay method (magenta) shows a noisier spectrum and that the spectral peak shape information is manipulated compared with the pure signal spectrum. The wavelet transform method (red line) shows a better performance than the Savitzky–Golay method. Nonetheless, the peak positions and the peak shapes are manipulated slightly.

Details are in the caption following the image
Comparison of noise-reduced spectra r(xi) (black line) using (a) vector casting method, (b) envelope-finder algorithm, (c) wavelet based smoothing, and (d) Savitzky–golay filter with respect to the pure signal spectrum R(xi) (blue line). The original noisy spectrum is shown as grey line [Colour figure can be viewed at wileyonlinelibrary.com]

Finally, the performance of the four denoising approaches is compared based on experimentally acquired Raman spectra. Figure 12 shows 14 experimental Raman spectra of ethanol (grey lines) featuring different noise levels. A quasi-pure ethanol signal spectrum (black line) with large SNR is also shown as quasi-pure signal spectrum Ssig(xi).

Details are in the caption following the image
Experimentally measured Raman spectra of ethanol (grey lines) featuring different SNR levels and one quasi-noise-free pure signal Raman spectrum of ethanol (black)

Figure 13a–d shows each 14 Raman spectra (grey lines) of ethanol, normalized to the highest peak at around 845 nm, recovered using vector casting, mean of envelopes, wavelet, and Savitzky–Golay smoothing methods, respectively. The spectrum of ethanol with high SNR is also shown in blue line in the figures for reference. Moreover, to assess the reproducibility of the recovered spectra, the standard deviation of the 14 recovered noise-reduced spectra at each variable (here Raman shift) is computed and depicted alongside the recovered spectra as a red line. The standard deviation is quantified on the right ordinate. As it can be seen from Figure 13, the standard deviation is higher around the peak regions than at peak-free regions. Thus, all techniques affect the peak to some extent. However, with the vector casting method, a better reproducibility of the spectra was obtained. In every peak, the vector casting method achieved the minimum standard deviation. The mean of the envelopes also shows a comparable result with the wavelets method. The standard deviation of the Raman peak at around 812 nm is 0.013, 0.021, and 0.052 for the vector casting, wavelets, and Savitzky–Golay methods, respectively. The peak broadening effect of the Savitzky–Golay technique is highly reflected by the standard deviation of the peak at around 842 nm. Moreover, the standard deviation of the double Raman peaks at 855 nm, shoulder peak at 870 nm, and Raman peak at 884 nm is decreased from 0.07 and 0.025 to 0.021, from 0.04 and 0.019 to 0.016, and from 0.07 and 0.024 to 0.02 with respect to Savitzky–Golay and wavelet methods, respectively.

Details are in the caption following the image
Raman spectra of ethanol (grey lines) denoised by (a) vector casting method, (b) envelope-finder algorithm, (c) wavelet transform method, and (d) Savitzky–Golay smoother. The pure signal spectrum (blue line) and standard deviation of the denoised spectra (red line) are shown [Colour figure can be viewed at wileyonlinelibrary.com]

Next to the circumstance that the here proposed new method for the denoising of raw spectra outperforms the two most frequently used methods, it has to be mentioned that the newly proposed method also involves a minimum of human interaction. In contrast, our method requires envelope detection that involves peak detection. We also compared the proposed algorithm in terms of computational efficiency. The language used for the implementation was Python.44 The average time taken for the envelope-finder algorithm was comparable with SG on a Dell Latitude E7450 with an Intel Core i7 processor. However, the vector casting method took longer execution time, and the average execution time depends on the number of vectors to be casted.

4 CONCLUSION

In this study, we developed a new method for the processing of spectra that are relevant for the purification of spectral signal from spectra with small SNR. Of course, this technique cannot extract signal peaks that are smaller than the noise level, but it can remove noise, although manipulating the characteristics of the pure signal less than the wavelet transform method or the SG method. Furthermore, the proposed method does only to a minimum extent rely on input parameters that have to be chosen by humans. Summarizing, the proposed method should be considered reliable, robust, and accurate.

ACKNOWLEDGEMENTS

The project leading to this result has received funding from the Wilhelm Sander-Stiftung, Munich, Germany (Grant 2017.111.1). It also has received funding from the European Union's Horizon 2020 research and innovation programme under ERC Starting Grant agreement 637654 (Inhomogeneities).