The science of speech • City, University of London

Allen Hirson, Senior Lecturer in Phonetics at City, talks about speech and phonetics.

Published Friday, 17^th August 2018 (Updated Wednesday, 4^th November 2020)

It’s a funny thing speech; everyone does it, but few know about its details,” says Allen Hirson, as we sit down to discuss speech and phonetics at City, University of London.

Having worked on over 1,800 forensic cases involving speech recordings over nearly 30 years from the Crown Courts to the International Criminal Court, Mr Hirson is one of the foremost experts when it comes to forensic speech analysis and its inherent intricacies.

As a Senior Lecturer in Phonetics at City, besides educating the next generation of speech and language scientists, Allen is also an active researcher who specialises in the forensic analysis of audio recordings, as well as speaker identification, language identification, and identifying background sounds. He has also worked on some 27 languages other than English, from Arabic to Urdu - in collaboration with world-renowned language specialists on a huge range of different cases.

Individual idiosyncrasies

Our speech is incredibly complex, with many different components – from the position of our tongue, to the vibrations of our vocal folds – that influence the way that we communicate with the world.

“There are around 50 elements that I look for when analysing speech”, says Allen.

For speech identification, and production of a phonetic and acoustic profile, these individual elements are vital and can reveal a great deal about the person. Once a suitable recording has been obtained, Allen can then go about constructing such a profile by comparing a questioned recording and a reference recording of a known individual.

“Pitch is determined by the rate at which the vocal folds vibrate and can range for an adult male between 90 and 210 Hertz with a population average of around 100Hz”, he says. “Features of the voice are plastic, and pitch, for example, can be affected by the psychological, physiological and situational factors; under stressful conditions it normally rises.”

How pitch is used – namely intonation – is also an important component, as some people may use a distinctive pattern of intonation. Younger speakers of English, for example, often use a rising intonation at the end of sentences too, even when there is no interrogative intended.

“Voice quality plays a significant part in speaker determination,” says Allen. “I also look for pauses fillers, and whether they are filled with elements such as ‘er’, ‘um, ‘like’, ‘to be honest’. These and other patterns may be idiosyncratic, making them potentially speaker-specific and consequently of forensic significance.”

When it comes to speech itself, other aspects such as rhythm, accent and pronunciation are important.

“With rhythm, there are stressed and unstressed syllables, and this can be quite indicative. For example, a native speaker of English has a pattern where there are alternately stressed and unstressed syllables. This may be quite difficult for a native speaker of Italian, for example, where syllables are equally weighted. This can also in a distinctively non-native rhythmical structure.”

Regional accents

Accents can also vary widely throughout the UK and in other parts of the world where English is spoken.

“As a preliminary stage in any investigation, accent is characterised as far as possible since this determines a baseline for the analysis of individual speech sounds. In the British Isles there is a large range of accents, and of course you can get mixed accents too,” says Allen. “Foreign accents of English are heavily influenced by the speaker’s mother tongue, but as a result of immigration over many years, second and third generation immigrants may retain only very subtle residues of parents’ or grandparents’ native language, now overlaid with a regional accent.”

Details of pronunciation of vowels and consonants can also be revealing; there are around 18 vowel sounds in English, and distinctive or unusual vowels can be a useful clue when building a speaker profile along with the 24 consonant sounds.

“How a speaker uses the language, the characteristics of their voice, and what features might make them stand out from the crowd is the basis for creating a profile of the speaker on the questioned, recording,” says Allen. “Once this is established, one seeks matches or mismatches for the most distinctive features on the reference recording.”

The quality of recordings are also of utmost importance, and Allen and colleagues will always try to obtain a copy from as close as possible to the source. The provenance of a recording is also important, as in some cases there may be a suspicion that the recording has been tampered with or manipulated; sudden unexplained changes in background noise can be an indication of tampering.

“MP3 is also a problem, as this format involves a form of compression in which data is invariably lost,” says Allen. “Due to the patent on MP3, it’s not possible to know what precisely has been lost.”

Beginnings

Originally training as a Biologist, it was Noam Chomsky’s seminal work on linguistics which first interested Allen in the area. Following a module in generative grammar – a set of rules that indicates the structure and interpretation of sentences which native speakers of a language accept as belonging to the language – Allen moved into teaching before deciding to study Phonetics at UCL.

Moving to City after UCL, Allen went on to establish the Speech Acoustics Laboratory at City, and as a Senior Lecturer in Phonetics he also lectures on the Speech and Language Science BSc, covering aspects such as English phonetics, speech acoustics, instrumental techniques and forensic phonetics as well as teaching phonetics at postgraduate levels.

Allen’s current research at City is also looking at questions that arise from forensic casework such as the forensic analysis of background birdsong, and spoken codes – cryptolects - such as Pig Latin.

“I’ve been working on a form of speech encryption. A murder case I worked on some years ago that was tried at the Old Bailey involved a gang used English to communicate with one another, but the English was peppered with Pig Latin. Instead of talking about a piece of paper, they would refer to a piece of aperpay. Switching in and out of English, the Police were unable to decode it.”

Other aspects such as background sounds, including birdsong can be incredibly important too.

In particular work by Professor Peter Marler, a pioneering animal behaviourist who worked at the University of Cambridge and later the University of California at Berkeley, famously said that bird dialects were so distinctive that “if you really know your white-crowned sparrows, you’ll know where you are in California”.

As a result, it’s not just in the intricacies of speech itself that can reveal a huge amount of information, background noise may assist in narrowing down the location or even the time of a recording. Identification of bird species in the background, for example, can be of great value in terrorism or kidnapping cases, helping to reveal the whereabouts of people and places.

Share this story

Related schools, departments and centres

Department of Language and Communication Science