Rapid and robust on‐scene detection of cocaine in street samples using a handheld near‐infrared spectrometer and machine learning algorithms

Abstract On‐scene drug detection is an increasingly significant challenge due to the fast‐changing drug market as well as the risk of exposure to potent drug substances. Conventional colorimetric cocaine tests involve handling of the unknown material and are prone to false‐positive reactions on common pharmaceuticals used as cutting agents. This study demonstrates the novel application of 740–1070 nm small‐wavelength‐range near‐infrared (NIR) spectroscopy to confidently detect cocaine in case samples. Multistage machine learning algorithms are used to exploit the limited spectral features and predict not only the presence of cocaine but also the concentration and sample composition. A model based on more than 10,000 spectra from case samples yielded 97% true‐positive and 98% true‐negative results. The practical applicability is shown in more than 100 case samples not included in the model design. One of the most exciting aspects of this on‐scene approach is that the model can almost instantly adapt to changes in the illicit‐drug market by updating metadata with results from subsequent confirmatory laboratory analyses. These results demonstrate that advanced machine learning strategies applied on limited‐range NIR spectra from economic handheld sensors can be a valuable procedure for rapid on‐site detection of illicit substances by investigating officers. In addition to forensics, this interesting approach could be beneficial for screening and classification applications in the pharmaceutical, food‐safety, and environmental domains.


| INTRODUCTION
Cocaine is one of the most abundant drugs of abuse worldwide, with an estimated global annual production of 2000 metric tons of pure cocaine. 1 Even though the recreative use of this drug has been banned for years, cocaine abuse is still increasing. In 2020, the annual report of the Drugs Information and Monitoring System in the Netherlands revealed that around 6.5% of the Dutch population has used cocaine at least once. 2 In 2018, more than 40 metric tons of cocaine was confiscated in the Netherlands alone. Cocaine street samples are becoming increasingly more potent, with cocaine contents in the Netherlands increasing from an average of 48.7 wt% in 2011 to 65.5 wt% in 2018. 2 These percentages are consistent with cocaine contents reported throughout Europe, with averages of 51-73 wt% between countries with an interquartile range of 40-84 wt%. 3 In addition to cocaine, the global illicit-drug market consists of many other substances such as the conventional synthetic drugs amphetamine, methamphetamine, and 3,4-methylenedioxymethamphetamine (MDMA) as well as many new psychoactive substances (NPSs) and precursor chemicals that can be controlled depending on national legislation. The great variety of substances is not the only challenge when developing suitable indicative tests for drug-suspected seizures; street samples are often not pure, because they are regularly adulterated with cutting agents. These cutting agents occur in great variety, and common adulterants for cocaine include sugars (eg, mannitol, inositol), caffeine, phenacetin, lidocaine, procaine, paracetamol, and levamisole. 2,4-6 The everchanging and complex illegal market makes it a difficult task for law enforcement to control these substances. Therefore, there is a need for reliable, affordable, and fast detection techniques to identify the suspected compounds.
Conventionally, simple colorimetric tests are used to obtain a first indication of whether a substance contains a frequently occurring illicit compound. 4,7 The most commonly used indicative test for cocaine in seized material is the Scott or Ruybal test that produces a blue color from a cobalt(II)thiocyanate complex formed in the presence of cocaine. 8,9 In addition to test solutions prepared in the forensic laboratory, a large variety of commercial test kits based on this complex are manufactured in the form of pouches, ampules, swabs, and wipes. 10 A well-known limitation of these cobalt(II)thiocyanate-based tests is their false-positive (FP) response to several common adulterants such as levamisole and lidocaine. 11 This is a major drawback as levamisole is one of the most frequently used cutting agents, present in more than 40% of seized cocaine samples. 12 It is therefore not unlikely to encounter pure levamisole, or another legal yet FP substance, in a drug-related setting, either as a pure cutting agent or packed in wrappers to be sold as cocaine in a scam. Another limitation of colorimetric tests is their limited specificity. These tests are available only for a limited number of traditional illicit substances, and each specific test formulation has its own profile of FP reactions and thus needs a dedicated validation study. In addition, colorimetric tests require touching and manipulating the sample and thus cause a potential safety risk for the investigating officer when highly potent substances such as fentanyl and its derivatives are encountered. 13,14 In the forensic field, colorimetric tests are accepted to provide a presumptive result suitable to detain a suspect and to select samples for confirmatory laboratory analysis. Only the laboratory results are subsequently used as evidence in court by the prosecution. 15 Therefore, confiscated samples always require additional testing in a forensic laboratory to employ more advanced analyses, including gas chromatography-mass spectrometry (GC-MS), liquid chromatography-MS, or Fourier-transform infrared (FT-IR) spectroscopy.
Alternative rapid and portable techniques that can overcome the limitations of colorimetric tests are being explored. Technical innovations in attenuated total reflectance (ATR)-FT-IR spectroscopy led to the development of portable ATR-FT-IR devices to provide on-scene analysis of samples. 16 Also, electrochemical tests that can overcome the specificity issues of the Scott test have been developed. 11,17 However, both electrochemical tests and ATR-FT-IR spectroscopy still require touching and handling of sample material. Raman spectroscopy and near-infrared (NIR) spectroscopy are two techniques that can analyze through the packaging material without handling the sample and can be operated by minimally trained staff. These techniques therefore provide an intrinsic safer procedure for the operator. 16,18,19 Although commercial Raman-handheld spectrometers are already being used by law enforcement officers, this technique still faces limitations. One of the major problems is that fluorescent compounds interfere and obscure Raman signals, leading to limits of detection that are dependent on the specific adulterants present in the sample. 20 In addition, because commercial Raman devices possess built-in library-based techniques, they cannot always detect low concentrations of controlled substances in mixtures nor can they detect compounds that are not included in the library.
In contrast, NIR analyzers are not affected by fluorescence and are much cheaper and smaller than Raman devices. NIR analyzers therefore have the potential to be implemented as cost-effective standard equipment for the general police or customs officers, whereas the more-expensive Raman instruments more likely remain a tool for more-specialistic forensic investigators for economic reasons and the expertise required to interpret the measurements correctly. An example of a commercial NIR spectrometer is the SCiO from Consumer Physics (Herzliya, Israel). The SCiO operates in a narrow-wavelength range (740-1070 nm or 13,500-9350 cm −1 ) unlike many other NIR spectrometers operating in higherwavelength ranges up to 2500 nm (4000 cm −1 ). The SCiO scanner is however one of the cheapest devices that are currently commercially available. NIR spectra are based on vibrational overtones and combination bands, yielding raw spectra that are initially noninformative. 21 Therefore, extensive data preprocessing followed by chemometric data modeling is needed to extract useful information from the data. Several studies have already shown that chemometric analysis of the data is of great use to apply NIR devices operating at longer-wavelength ranges in forensic casework. [22][23][24] Liu et al 22 successfully demonstrated the benefits of a multimodel approach on forensic drug samples. They used soft independent modeling of class analogy (SIMCA) for the classification of spectra with a methamphetamine, ketamine, heroin, or cocaine class followed by individual partial least square (PLS) regression models for quantification. Hespanhol et al 23 were the first to apply different models on   NIR spectra to answer a set of forensic-relevant questions, including   SIMCA for cocaine HCl and cocaine base classification, PLS for quantitative information, and multivariate curve resolution for establishing the degree of adulteration. Both earlier studies used NIR devices with relatively extensive wavelength ranges, that is, 1000-2500 nm 22 and 900-1700 nm depending on the choice of the NIR equipment. 23 NIR spectrometers operating at wavelength ranges above 1000 nm require more advanced light sources, active cooling in some cases, and more expert application knowledge and are thus less economically attractive when considering the largescale use in the field by law enforcement officers. To our knowledge, no forensic studies have been published on the applicability of low-cost NIR spectrometers operating in the relatively limited spectral region of 740-1070 nm.
Chemometric data analysis approaches are widely used in analytical chemistry and forensic science in disciplines dealing with large amounts of data of high complexity or with limited spectral features. 25,26 The frequently used classification schemes include SIMCA, principal component analysis (PCA), linear discriminant analysis (LDA), and partial least squares-discriminant analysis (PLS-DA), whereas partial least squares-regression (PLS-R) is commonly used to fit a linear correlation between the multidimensional spectral data and the concentration of a compound of interest. 27,28 In addition, approaches such as support vector machines (SVM), k-nearest neighbors (kNN), random forest, and artificial neural networks (ANNs) already proved their value for NIR spectral data modeling outside of the forensic field. 29 Classification algorithms based on spectroscopic data are operational in the fields of food and medicine authentication. Teye et al 30 demonstrated the correct classification of the origin and quality of rice by both kNN and SVM machine learning models applied on short-range (740-1070 nm) NIR spectra after multiplicative scatter correction as preprocessing and PCA for data dimensionality reduction.
Non-cocaine samples in tablet, rock, or crystal form were ground in a mortar to obtain a powder. Coarse powdered case samples were not ground further. All individual samples were transferred to clear borosilicate glass vials (4 mL, 15 mm diameter × 48 mm height) from VWR (Amsterdam, the Netherlands). All vials used for model development were filled with at least 5 mm of powder to ensure a sufficient sample layer for diffuse reflectance spectroscopy.

| Instruments and settings
NIR spectra were recorded using a pocket size (54 × 36 × 15 mm, 35 g) SCiO handheld NIR spectrometer from Consumer Physics, hardware version 1.2. All SCiO sensors were operated via the SCiO "The Lab" mobile application on the operator's iOS or Android smartphone or tablet using a Bluetooth connection. Before use, each sensor was calibrated using the built-in calibration device in the sensor cover. The sample scanning procedure is shown in Figure 1

| Data analysis
Raw spectral data were imported in Unscrambler 11 (Camo Analytics, Oslo, Norway) for data preprocessing optimization and exploratory analysis using PCA, SIMCA, and PLS-R. Preliminary models based on SIMCA-PLS-R were found to produce reasonable results although some disruptive FP results were obtained for a few common cutting agents. Subsequent model building was performed in R version 3.6.3 (R Foundation for Statistical Computing, Vienna, Austria) 35 using RStudio version 1.2.5033. R-packages, prospectr_0.2.0, 36 signal_0.7-6, 37 and caret_6.0-86, 38 were used.
The following two data preprocessing methods were found suitable based on visual inspection of the data: standard normal variate (SNV) 36 preprocessing followed by either a first-or second-order derivative with Savitzky-Golay 37 smoothing using a 19-datapoint window. This window size was optimal for noise removal, as shown in Figure S1 (supporting information). When indicated that data are processed with focus on a specific region of interest (ROI), the following ROIs were used: for first-order derivative data: the 839-939 nm part of the spectrum; for second-order derivative data: the 839-914 nm part of the spectrum.

| Spectral reproducibility and selectivity
For every individual sample within Set A and Set B, 65 replicate scans collected using 13 different scanners were available. In general, the five scans collected as replicates on a single scanner were visually similar; however, major intensity differences were observed among different scanners. The spectra marked as "raw" in Figure 2A show the unprocessed raw data from multiple scanners for an 86.6% cocaine HCl case sample. Each colored line indicates five replicate spectra from a single scanner, and each color indicates a single scanner. Additive baseline shifts could be clearly observed, with most scanners providing a near-similar intensity, whereas three individual scanners returned a notably less-intense signal. From observations of different samples, it was evident that this additive effect could not be attributed to individual poorly performing sensors, as sensors producing a low-intensity signal for one sample did produce a high-intensity signal for other samples. Two possible explanations for these additive effects are variation in sample vial positioning and signal scatter. As the glass vial containing the sample needs to be placed on top of the sensor before scanning (Figure 1), the variation in signal intensity might be due to the alignment of the sample. Operators were instructed to simply put the sample vial on top of the sensor such that both the NIR light source and the detector cell were covered by the vial. No special attention was given to the perfect alignment of the samples as this will also not be the case in the actual on-scene analysis by police officers. However, with a vial diameter of 15 mm and a NIR light source and detector diameter 13 mm wide, there is limited tolerance. It is therefore possible that a vial placed more toward the detector surface will lose more light emitted from the sensor through the glass wall of the vial, thereby reflecting less signal. The other explanation given for the signal variation is the scattering effect of the F I G U R E 2 Effect of preprocessing on near-infrared spectral data. A, Top-row spectra are replicate scans of the same 86.6% cocaine HCl sample on 13 different scanners (5 spectra each). B, Bottom-row spectra are scans from 4 cocaine HCl (green), 4 cocaine base (red), and 10 other common adulterants and other drugs (5 spectra each) measured using the same scanner. Spectra are shown in columns as raw spectral data (raw), after standard normal variate preprocessing (SNV) and SNV followed by Savitzky-Golay smoothing (19 datapoints) with a first-order derivative (1st DER) [Colour figure can be viewed at wileyonlinelibrary.com] material. Sample vials were necessarily touched and moved between analyses, and the powder in the vials was consequently shaken and redistributed. As the particle size of the powders might not be constant in actual case samples, more or less scattering resulting in varying signal intensities can occur. Sample scattering and the consequent signal variation is a regular phenomenon in diffuse reflectance NIR spectroscopy. 24 Common strategies to correct for signal intensity and scattering effects are by means of data preprocessing. In our study, SNV processing proved to be a suitable technique for baseline correction (Figure 2A(ii)), in line with other studies. 22,24,30 As NIR spectra, in general, and these small-range spectra, in particular, are relatively information poor, a subsequent derivative preprocessing step is often suggested to put emphasis on the spectral differences. Consistent with an earlier NIR study on narcotics from Liu et al, 22 a first-order derivative following SNV was found sufficient for our data. A secondorder derivative, as suggested for cocaine analysis by Hespanhol et al, 23 logically revealed even more spectral features on our data. For the differentiation of relatively pure substances, this preprocessing method could be the first choice due to the more-prominent differences between compound spectra. However, the aim of this study was to correctly detect cocaine, even in complex mixtures of multiple compounds, and a second derivative could result in complex data sets in which the cocaine-related signals are obscured, leading to poor model performance. Also, in cases of low NIR signal (ie, flat line reflection spectra), second derivative spectra can exhibit excessive noise.  Figure S3 (supporting information).
Because of these spectral differences, cocaine base and cocaine HCl are treated as different compounds in the detection model. applied. The k-nearest neighbors models were cross-validated on a "leave-one-sample-out" basis-up to 65 scans of one sample were removed from the training set. For the ANN and bagged tree models, the data set was divided into 10 cross-validation segments such that all individual replicate spectra from a single sample were left out in the same segment. This prevented the model from producing very optimistic results due to obvious similarities among replicates. 39 After data-preprocessing and cross-validation segment creation, two separate models were used on the data.

| kNN submodels
The first submodel (depicted as A in Figure 4) Figure 5A shows an example of the Euclidean distances of all spectra toward a single cocaine base spectrum. It can be observed that only cocaine base class spectra exhibit clear similarities with the reference spectrum as indicated by a small distance or high correlation score. Also, a diagonal trend is visible within the cocaine base spectral group which is related to the cocaine content in the samples, which decreases from left to right as the samples are ordered as a function of concentration.
In a similar fashion, a "match" with the relatively high-concentration cocaine HCl spectra can be observed from the kNN Pearson correlations in the example in Figure 5B

| ANN-dual bagged tree regression submodels
The second model (depicted as B in Figure 4) is an ANN model with PCA pre-scaling for classification between cocaine HCl and cocaine base classes followed by two separate bagged tree ensembles for predicting a concentration for spectra within each class. The ANN model is applied on the PCs following PCA data reduction of the spectral data such that 99% of the variation is included in the PCs. In this data set 19 PCs were used to satisfy this criterion. As a result, the ANN model predicts the salt form (HCl or base) in the form of a probability value for all spectra in the database. Spectra with probabilities within the 0-0.05 and 0.95-1 ranges are, respectively, assigned to the "HCl" or "base" class, whereas all spectra with probabilities outside these ranges are not assigned to a class. As a next step, two separate treebag regression models are trained for both cocaine types. A

| Final model decision on spectral and sample levels
For each spectrum the final model result is calculated as shown in is negative. In both these cases, the result from the second submodel is ignored. When the kNN submodel was inconclusive, the result was determined by the second ANN-dual bagged tree submodel. In this case, the final result was positive when the predicted class for a spectrum was either "HCl" or "base" and the predicted concentration was above 20%. In all other cases, the result was negative. The reason for this submodel hierarchy is the better overall performance of the kNN submodel over the ANN-dual bagged tree submodel, as discussed in detail in Section 3.3. On the sample level, the final decision is made by majority voting of the results from all replicate spectra. Next to a final cocaine positive/negative decision, a sample ID is predicted based on the most similar spectra in the training data set.  in the majority voting process. A general trend observed was that the ROI-based approaches appeared to be more sensitive. This phenomenon can be explained by the increased focus on the major spectral features of cocaine, which increases the sensitivity with respect to cocaine detection but can also result in reduced selectivity and thus increased FP rates. Table 2 presents the performance of the submodels that when combined provide the final model results. Additional details are presented in Table S1 (supporting information). In general, the kNN submodel gives the best performance with greater than 99% average T A B L E 1 Performance characteristics of the model decision flowchart as indicated in Figure 4 for the various forms of preprocessing accuracy. The limitation of this model is the number of inconclusive results when no near neighbors are identified within the thresholds.

| Model performance evaluation
For the case samples, this is true for only 9% of the entire sample set, whereas more than half of the negatives do not give a match. As described earlier, the latter is an expected result due to the "leave all from the same sample out" cross-validation. This problem will likely disappear when more negative samples are included in the model; however, novel substances will most probably yield an inconclusive result for this submodel. The second submodel, the ANN-dual tree bagging model, yields an average overall correct assessment in 95% of all samples considered, but contrary to the first submodel, a classification is obtained for all spectra. As the final model conclusion gives priority to the result from the more-reliable kNN submodel, the result from the second submodel must be considered only for those spectra for which no result from the first submodel was obtained. These values are given in the bottom part of It is important to emphasize that the FP versus FN rate of the model could be adjusted by varying the weight percentage-cocaine threshold. In this way, the optimal model performance can be "tuned" for specific forensic or security use.  Notes. Percentages are based on all spectra from case samples, negative samples, and reference samples with concentration above 20%, being 9459 spectra. The performance of submodel 2 is shown for comparison as this submodel is applied only on the inconclusive spectra following submodel 1. observed, but this comes at a cost of 9.9% FNs. Point B in Figure 8 shows the optimal accuracy around an 18 wt% cocaine cutoff, and point C in Figure 8 gives the optimum for minimal (<0.1%) FNs that corresponds 28.8% FPs. This optimum was observed at a 2 wt% cocaine-cutoff threshold. These characteristics clearly show that the threshold could be used to optimize the model for specific forensic settings in which certain FP or FN percentages could be acceptable. In this study, a generic 20% threshold was applied. This was found suitable for the actual Dutch cocaine market where sub-20% cocaine content case samples are rarely encountered and the reported average cocaine content is above 50 wt%. 2-5 Although the observed FP and TP rates were satisfactory for an indicative testing method, it must be noted that applying such a threshold will deliberately introduce FN results for samples with cocaine content below this threshold.
Although it is acceptable in specific situations such as indicative testing, narcotic legislation in many countries does not contain concentration limits for controlled substances, and even low-concentrated samples are thus illicit.

| Discussion
This is the first study to demonstrate cocaine detection using NIR spectral data covering a limited-wavelength range and using a low- For example, when cocaine mixed with a novel cutting agent is encountered for the first time, the model might produce a (false) negative result. However, after adding the spectra to the database, a new sample having the same composition will "match" with the first one as a near-neighbor. In this way, the model will rapidly adapt to changes in the cocaine market.
Other possible improvements are the analysis through various types of packaging materials. In this model, all spectra were collected through glass vials, which in a practical setting still means that the officer conducting the NIR measurements needs to transfer the suspect material from its original packaging to a suitable glass container. Earlier studies report on the limited effect of plastic packaging on the NIR signal. 41,42 Future studies on the effect of packaging materials might therefore be beneficial as this will even further reduce the risk of the investigating officer being exposed to harmful substances. Also, in this study only white-or off-white-colored powders were included can safeguard the quality of the on-scene analysis "from a distance" and in "real time" as the model returns an analysis outcome to the smartphone almost instantly.