The biggest challenge for Raman spectroscopists today is communicating the data between biologists, chemists and engineers. When we want to identify an unknown chemical, we first have to take the Raman spectrum of a known chemical and compare the results. What if instead of relying on trial-and-error investigations, we compared our unknown sample to a mega databank of known samples which have already been produced?
The importance of Raman Spectroscopy (RS) in chemical analysis is attributed to its non-destructive and label-free methods. Label-free being without requiring any stable isotope ‘tag’ to bind to a molecule/protein to identify its quantity. RS has been used in chemical identification in a variety of fields such as archaeology, disease analysis, cancer screening, herbal medicines, cooking oils, and alcoholic substances. The diversity that Raman has used gives me optimism that a system which integrates all of these disciplines into one may be created within the next decade.
After brainstorming designs and sketches to implement such a device, I first had to consider some of the fundamental problems in unifying Raman spectroscopy into a single databank.
Why Raman?
I was first introduced to Raman spectroscopy by Dr Mingzhou Chen and Dr Graham Bruce in my Final Year Project, where I constructed a compact Raman device designed to be used for food and liquor analysis.
The work I did during these final months of my Bachelors has influenced my motivation to design this device. I wanted this technology to be developed further and fulfil greater applications. Once the lenses, mirrors and lasers were configurated correctly, I found the only limit was the data analysis.
Figure 1: Raman spectroscopy set up for my Bachelor in Physics Thesis titled: ‘A compact Raman system for food and liquor inspection’.
Figure 1 (in my thesis Figure 3.1) shows the Raman system with all the necessary components. The MaxLine filter narrows the wavelength to its monochromatic 785nm, the Dichroic mirror reflects the excitation laser through a lens which focuses our beam to a spot of around 20 (micro)m within our alcoholic sample. The majority of scattering photons are scattered elastically in this process, where the incident photons have the same wavelength as the scattered photons (known as Rayleigh scattering). This scattering can be up to only 0.1% of the incident radiation. Raman scattering is orders of magnitude smaller than this, hence it is important to focus this scattered light correctly onto a sensitive spectrometer. The StopLine notch filter blocks any Rayleigh scattering which would otherwise oversaturate the spectrometer. What is left to collect is the Raman scattered light which has been shifted from the incident laser light.
Figure 2: Raman spectrum for a variety of different whisky brands, clearly showing differing fluorescence levels. Compared with ethanol (black), we can see that these peaks are strongly ethanolic. (a, b) Zooming in on areas where the traces are very similar, the device is able to differentiate between alcohols of very similar methods, namely (see (b)) the Euro method denatured ethanol (dark red) and the German method denatured ethanol (red).
The Optical Manipulations Group are big on their whisky, and so I got my hands on lots of whiskies to analyse!
The Chosen Laser Depends on Your Sample.
The wide range of Raman spectroscopy applications over the past few decades has partly coincided with the advancements in laser technologies. With those advancements come a higher range of wavelengths and higher laser power.
Raman scattering, which relies on photon-phonon interactions of a molecule in a sample, is intrinsically weak. The intensity of Raman scattering is inversely proportional to the fourth power of the laser wavelength. Hence, as the laser wavelength increases, the Raman scattering intensity diminishes rapidly. When comparing UV and near-infrared lasers, the spectrum acquired from a near-infrared laser can be significantly less intense, up to 15 times lower. Consequently, UV and visible lasers necessitate shorter accumulation times and can be operated at lower laser power compared to their near-infrared counterparts. Figure 1 depicts the contrast in Raman intensity between a 638 nm laser and a 785 nm laser, utilizing a silicon sample under identical conditions.
Figure 3: Silicon spectra measured on the RM5 Raman Microscope with a 638 nm laser (orange) and a 785 nm laser (red) under the same conditions (Edinburgh Instruments).
Raman spectra are often obscured by strong fluorescence, which can arise from various sources. Near-infrared lasers are preferred to mitigate fluorescence due to lower molecule absorption. The most commonly used laser wavelength is 785 nm, offering a balance between low fluorescence and moderate Raman intensity. For samples with high fluorescence, a 1064 nm laser may be needed, despite the potential reduction in Raman intensity and sample damage. Figure 4 shows fluorescence suppression with different excitation wavelengths.
Figure 4: Nicotine patch spectra measured on the RM5 Raman Microscope with a 532 nm laser (green) and a 785 nm laser (red). (Edinburgh Instruments)
In addition, there are many data processing techniques which can remove the background fluorescence. The Standard Normal Variate Algorithm, the Savitzky-Golay smoothing filter as well as subtracting the intensity by a higher-order polynomial can lower the signal to the baseline.
However, sometimes the background fluorescence is important when distinguishing between similar substances. The fluorescence I acquired by analysing whisky samples was very important in identifying them. With ethanol being the prime ingredient, the only variation in each whisky brand was the background fluorescence.
The choice of wavelength is also crucial when analysing specific samples. For biological samples, IR radiation is used as it is less likely to cause damage to biological molecules compared to ultraviolet (UV) radiation.
Biological samples, such as proteins, nucleic acids, and lipids, are composed of delicate molecular structures that can be sensitive to radiation. UV radiation has higher energy compared to IR radiation and can cause ionization and excitation of electrons, which may lead to structural changes or even damage to biological molecules. UV radiation is also known to cause DNA damage, including mutations and breaks in the DNA strands, which can have adverse effects on the integrity and functionality of biological samples.
On the other hand, IR radiation used in Raman spectroscopy has lower energy compared to UV radiation. It falls in the range of 700 to 4000 cm^-1, which corresponds to the vibrational frequencies of molecular bonds in biological samples. IR radiation in this range does not have enough energy to cause ionization or excitation of electrons and therefore is less likely to cause structural changes or damage to biological molecules.
Additionally, water is a major component of biological tissues and has relatively weak absorption in the IR range. This means that IR radiation can penetrate through water without significant attenuation, allowing for deeper penetration into biological samples and obtaining spectra with less interference from water absorption bands.
For very delicate samples such as proteins, cell organelles and nucleic acids, lasers can damage the sample you are investigating. Extended exposure or high-energy lasers can harm samples, with UV lasers posing a greater risk due to their higher energy per photon. Near-infrared lasers (e.g. 1064 nm) also carry a risk of sample damage due to a higher power and longer exposure times. Reducing laser power may degrade the signal-to-noise ratio. The visible region is the safest for sample protection. Figure 3 depicts a burn mark on a thin-film sample exposed to a near-infrared laser (785 nm).
Figure 5: Thin layer metal-organic framework before laser exposure (left) and after exposure (right) with 785 nm laser (Edinburgh Instruments).
Noting these parameters of laser power, exposure time and wavelength is very important when comparing the effectiveness of each Raman system. This remains the biggest challenge when creating a unified databank of spectrums. How do you know the system you use to create the databank is the most effective? (Most effective meaning the highest signal-to-noise ratio of the spectra) What if a better system comes along and outperforms your system, is your databank then worthless? A spectrum of ethanol isn’t just ethanol but has lots of important information which isn’t specified in order to create a transferable data set. Ethanol is Ethanol analysed at 30mW laser power at 785nm laser light for 20 seconds integration time. Every scientist, chemist, physicist and biologist varies these parameters based on the Raman system they use and the samples they are measuring.
PCA Cluster Bleeding
Spectrums are incredibly complicated. A single Raman spectrum of 3000 cm-1 with over 10,000 bins of intensity values has extremely high dimensionality. Any slight variation in the spectrum is hard to identify given so many bins to identify. PCA (Principle Component Analysis) is a very useful tool to reduce this complexity. Transferring the complex spectra into a single 2-dimension or 3-dimension point reduces the dimensionality significantly, making it far easier to compare different spectrums.
Figure 6: The process of Raman spectra into PCA (taken from ‘A review of artificial intelligence methods combined with Raman spectroscopy to identify the composition of substances’, Wiley, 2021)
Figure 7: The unknown sample (blue) is compared with each of the known chemical clusters (red & green). The distance to the PCs are determined and the cluster which is closest to the sample is assigned the chemical of the cluster.
Figure 8: PCA cluster plot of whisky variety. Each cluster group has distinct PCA values and an unknown sample may be assigned to one of these groups and the sample may be identified based on the region in which its PCs are located.
Depending on the stability of your samples under investigation, the variation in the sample’s PC cluster may bleed into another cluster. When analysing an unknown sample, where the PCs of the analysed sample are placed in a bleeding area, it can be very difficult for any algorithm to determine the chemical which closely matches the cluster average. To counter this overwhelming comparison between our planned extensive data bank, we need to create a system to eliminate clusters from the comparison. I would like to welcome AI experts to solve this problem without giving the user any extra labelling or tagging as Raman is a label-free technology.
The System at a Glance
Figure 9: The basic design of my proposed device: The Multi-Sample Multi-Wavelength Raman spectrometer.
I’ve been thinking a lot about the basic design of such a system. In order to make the device as durable as possible, I thought of implementing a multi-coloured laser feature. I’ve worked with multi-coloured lasers before where a non-linear rotating crystal —when pumped by an incident high energy UV laser can vary the laser’s wavelength across a continuum of the visible spectrum: from the UV to the near-IR. This will provide the user with more control than ever before when analysing a range of samples. The diffraction grating splits the high-powered laser into multiple laser beams, focussed by a large lens onto an array of glass vials with samples. The sample undergoes Raman scattering. Filters and lenses acquire the Raman signal from each sample, each separately guided to a spectrometer and further data analysis (for PCA, spectrum viewing etc.) The problem of integration time is also paramount. The spectrometer takes only a few thousand photons per second. Figure ^^ spectrums were analysed with an integration time of 2s with 6 averages. If the spectrometer can be miniaturised to fit between each sample in the array or multiplied economically the problem of distributing the Raman signal is solved.
If created correctly, this system could provide a robust collection of a massive set of chemicals faster than a single sample analysis. The sample is not damaged or irradiated multiple times as its distributed across many samples while producing multiple spectrums very quickly.
Raman Monopoly
If this technology is successful, it could revolutionise the field of chemical forensics. However, having a private company or central government the entire data bank of a meg-set of spectrums forces scientists to rely on this source, creating an unscalable monopoly for laser and chemical analysis technologies. I am also slightly uncomfortable having one entity host an extensive data bank of chemicals which scientists may only rely on. Having this databank be open-source, widely available, free to use and available to update is extremely important for the scientific community
Conclusion
I’m excited about the future of Raman spectroscopy, as are the scientists that implement it in their own research. However, I’m yet to see Raman used in wider industries and daily life. The Raman device I designed could have been taken further to develop a system to identify day-rape drugs in alcoholic drinks. Funding—after data analysis—is the next challenge every scientist faces.
Figure 10: Spectrums of various urine samples prepared by an anonymous\* student at various stages of hydration levels. The hydration level was measured qualitatively based on the student’s well-being.*
\*I say anonymous when I couldn’t get any of my classmates to pee in a cup for me for comparison. Soi this is the spectrum of my own piss.*