Forensic Voice Identification

Understanding Forensic Voice Identification & Speaker Comparison

Documents disclosed by Edward Snowden revealed that the U.

Navigation menu

National Security Agency has analyzed and extracted the content of millions of phone conversations. Call centers at banks are using voice biometrics to authenticate users and to identify potential fraud. But is the science behind voice identification sound? Several articles in the scientific literature have warned about the quality of one of its main applications: We have compiled two dozens judicial cases from around the world in which forensic phonetics were controversial. In the French Acoustical Society issued a public request to end the use of forensic voice science in the courtroom. There are plenty of troubling examples of dubious forensics and downright judicial errors, which have been documented by Hearing Voices, a science journalism project on forensic science carried out by the authors of this article in and The process usually involves at least one of the following tasks: The recorded fragments subject to analysis can be phone conversations, voice mail, ransom demands, hoax calls and calls to emergency or police numbers.

One of the main hurdles voice analysts have to face is the poor quality of recorded fragments. To make things worse, recorded messages are often noisy, short and can be years or even decades old.

Voice Identification & Speaker Comparison

In some cases, simulating the context of a phone call can be particularly challenging. Imagine recreating a call placed in a crowded movie theater, using an old cell phone or one made by an obscure foreign brand.

Sounds Interpreted Differently

Irrespective of the analysis method, forensic phonetics suffers from an even deeper scientific problem. Multiple voices can be compared in a single analysis. In simple words, not only the spectral maxima values of resonance frequencies are measured and compared, but the shape of those and the energy distribution along the frequencies. National Security Agency has analyzed and extracted the content of millions of phone conversations. In addition, the use of shared-secrets e.

Nevertheless, many forensic experts are willing to work on sound excerpts that are of extremely low quality. Unfortunately, these errors are not isolated exceptions. A survey published in June in the journal Forensic Science International by INTERPOL, the international organization that represents the police forces of countries, showed that half of the respondents 21 out of 44 —belonging to police forces from all over the world—employ techniques that have long been known to have shaky scientific grounds. One example is the simplest and oldest voice recognition method: In Guy Paul Morin, a Canadian, was sentenced to life imprisonment for the rape and murder of a nine-year-old girl.

Three years later, a DNA test exonerated Morin as the murderer. This kind of mistake is not surprising.

The rate of recognition was far from perfect, with a volunteer failed to recognize even his own voice. This does not imply, however, that automated methods are always more accurate than the human ear. Actually, the first instrumental technique used in forensic phonetics has been denied any scientific basis for a number of years, even though some of its variations are still in use, according to the INTERPOL report.

We are referring to voiceprinting , or spectrogram matching, in which a human observer compares the spectrograms of a word pronounced by the suspect with the same word pronounced by an intercepted speaker.

Mayor Rob Ford Toronto Smoking Crack-Voice Identification

A spectrogram is a graphic representation of the frequencies of the voice spectrum, as they change in time while a word or sound is produced. Voiceprinting gained notoriety with the publication of a paper by Lawrence G. Kersta, a scientist at Bell Labs, in the journal Nature. But in , a report by the National Science Foundation declared that voiceprints had no scientific basis: Nevertheless, the technique still maintains a lot of credibility. The conviction was partly based on voiceprint analysis.

The scientific community has explicitly discredited some voice analysis techniques, but is still far from reaching a consensus on the most effective method for identifying voices. There are two schools of thought, says Juana Gil Fernandez. Semi-automatic techniques are still the most widely used. Experts who rely on acoustic-phonetic methods usually start by listening to the recording and transcribing it into phonetic transcription. Then they identify a number of features of the voice signal.

The high level features are linguistic: Other high level qualities are the so-called suprasegmental features: Lower-level characteristics, or segmental features, mostly reflect voice physiology, and are better measured with specific software. One basic feature is the fundamental frequency. If the voice signal is divided into segments a few milliseconds long, each segment will contain a vibration with an almost perfectly periodic waveform.

The frequency of this vibration is the fundamental frequency, which corresponds to the vibration frequency of the vocal folds, and contributes to what we perceive as the timbre or tone of a specific voice. The average fundamental frequency of an adult male is about hertz, and that of an adult female is about hertz.

Speaker recognition - Wikipedia

It can be hard to use this feature to pin down a speaker. On the one hand, it varies very little between different speakers talking in the same context.

Among the sources of information used in legal identification, fingerprints and genetic data seem to provide a high degree of reliability. It is possible to evaluate . But is the science behind voice identification sound? Recent figures published by INTERPOL indicate that half of forensic experts still use.

On the other hand, the fundamental frequency of the same speaker changes dramatically when he or she is angry, or shouting to be heard over a bad phone line. Other segmental features commonly measured are vowel formants.

Automated Systems Can Produce False Positives

When we produce a vowel, the vocal tract throat and oral cavity behaves like a system of moving pipes with specific resonances. In spite of its popularity, the acoustic-phonetic method raises some issues. Because it is semi-automatic, it leaves margin to subjective judgement, and sometimes experts working on the same material using a similar technique can reach discordant conclusions. In addition, there are very few data on the range and distribution in the general population of phonetic features other than the fundamental frequency.

For these reasons, the most rigorous experts say that we can never be sure of the identity of a speaker based on voice alone. At most, we can say that two voices are compatible. It is recommended that the exemplar of the known voice be collected in as close to the same manner as the recording of the unknown voice was recorded. For example, if the recording of the unknown voice was recorded over the phone, the exemplar of the known voice should be collected over the phone, etc.

When the exemplar is collected the suspect is asked by the examiner to say the same words in the same way as they were spoken by the unknown person.

In other words in a normal, natural voice. Analysis is conducted through aural listening and visual comparison of the words through graphical waveform display. Each recording is transferred onto the computer using a digital sound card to ensure the best quality capture. A graphical display of the recorded material, called the waveform, can then be viewed as the recording is played and reviewed.

The configuration of the individual words can be seen as they are played. A computerized spectrographic analysis is conducted. This facilitates visual comparison of the features of each word spoken.

Voice Analysis Should Be Used with Caution in Court

The spectrogram displays the speech in three formats: The spectrogram shows time along the horizontal axis and frequency along the vertical axis. Amplitude is indicated in varying degrees of gray or colored shading. In recent years, major advances in biometric speaker identification analysis have occurred. Biometrics are used for identification of humans by their unique characteristics. A sample of as little as 16 seconds of pure speech from a known voice and an unknown voice is necessary.

The longer the samples are, the greater the likelihood percentage of identification or elimination. Multiple voices can be compared in a single analysis. The system uses the following three methods to compare the voices:. The pitch of the voice can be measured scientifically. The Pitch Statistics Method PSM contains 16 different pitch parameters, including average pitch value, maximum, minimum, median, percent of areas with rising pitch, pitch logarithm variation, pitch logarithm asymmetry, pitch logarithm excess and 8 additional parameters.