Why are sight and sound out of sync?
Researchers found that comprehension can sometimes actually improve by as much as 10 per cent when sound is delayed relative to vision
The way we process sight and sound are curiously out of sync by different amounts for different people and tasks, according to a new study from City, University of London.
When investigating the effect the researchers found that speech comprehension can sometimes actually improve by as much as 10 per cent when sound is delayed relative to vision, and that different individuals consistently have uniquely different optimal delays for different tasks.
As a result, the authors suggest that by tailoring sound delays on an individual basis via a hearing aid or cochlear implant - or a setting on a computer media player – could have significant benefits for speech comprehension and enjoyment of multimedia. The study is published in Journal of Experimental Psychology: Human Perception and Performance.
When the researchers at City looked deeper into this phenomenon, they kept finding a very curious pattern: different tasks benefitted from opposite delays, even in the same person. For example, the more an individual’s vision lags their audition in the performance of one task (e.g., identifying speech sounds), conversely the more their audition is likely to lag vision in other tasks (e.g., deciding whether lips followed or preceded the speaker’s voice). This finding provides new insight into how we determine when events actually occur in the world and the nature of perceptual timing in the brain.
When we see and hear a person speak, sensory signals travel via different pathways from our eyes and ears through the brain. The audiovisual asychronies measured in this study may occur because these sensory signals arrive at their different destinations in the brain at different times.
Yet how then do we ever know when the physical speech events actually happened in the world? The brain must have a way to solve this problem, given that we can still judge whether or not the original events are in sync with reasonable accuracy. For example, we are often able to easily identify when films have poor lip-sync.
Lead author Dr Elliot Freeman, Senior Lecturer in the Department of Psychology at City, University of London, proposes a solution based on an analogous ‘multiple clocks’ problem:
“Imagine standing in an antique shop full of clocks, and you want to know what the time is. Your best guess comes from the average across clocks. However, if one clock is particularly slow, others will seem fast relative to it.
“In our new theory, which we call 'temporal renormalisation’, the ‘clocks’ are analogous to different mechanisms in the brain which each receive sight and sound out of sync: but if one such mechanism is subject to an auditory delay, this will bias the average, relative to which other mechanisms may seem to have a visual delay. This theory explains the curious finding that different tasks show opposite delays; it may also explain how we know when events in the world are actually happening, despite our brains having many conflicting estimates of their timing.”
In their experiments, the researchers presented participants with audiovisual movies of a person speaking syllables, words or sentences, while varying the asynchrony of voice relative to lip movements. For each movie they measured their accuracy at identifying words spoken, or how strongly lip movements influenced what was heard.
In the latter case, the researchers exploited the McGurk illusion, where for example the phoneme ‘ba’ sounds like ‘da’ when mismatched with lip movements for ‘ga’. They could then estimate the asynchrony that resulted in the maximal accuracy or strongest McGurk illusion. In a separate task, they also asked participants to judge whether the voice came before or after the lip movements, from which they could estimate the subjective asynchrony.
Speaking about the study, Dr Freeman said:
“We often assume that the best way to comprehend speech is to match up what we hear with lip movements, and that this works best when sight and sound are simultaneous. However, our new study confirms that sight and sound really are out of sync by different amounts in different people. We also found that for some individuals, manually delaying voices relative to lip-movements could improve speech comprehension and the accuracy of word identification by 10% or more.
“This paper also introduces a new automated method for assessing individual audiovisual asynchronies, which could be administered over the internet or via an ‘app’. Once an individual’s perceptual asynchrony is measured, it may be corrected artificially with a tailored delay. This could be implemented via a hearing aid or cochlear implant, or a setting on a computer media player, with potential benefits for speech comprehension and enjoyment of multimedia.
“Asynchronous perception may impact on cognitive performance, and future studies could examine its associations with schizotypal personality traits, autism spectrum traits, and dyslexia.”