Enter the maze

Synthetic speech

a robot wearing a telephone headset

Computer-generated voices are encountered more and more frequently in everyday life, not only in automated call centres, but also in satellite navigation systems and home appliances.

Although synthetic speech is getting better, it’s still not as easy to understand as human speech, and many people don't like synthetic speech at all. Maria Klara Wolters of Edinburgh University decided to find out why. In particular she wanted to discover what makes synthetic speech difficult for older people to understand, so that the next generation of talking computers will be able to speak more clearly.

She asked a range of people to try out a state-of-the-art speech synthesis system, tested their hearing and asked their thoughts about the voices. She found that older people have more difficulty understanding computer-generated voices, even if they were assessed as having healthy hearing. She also discovered that messages about times and people were well understood, but young and old alike struggled with complicated words, such as the names of medications, when pronounced by a computer.

More surprisingly, she found that the ability of her volunteers to remember speech correctly didn’t depend so much on their memory, but on their ability to hear particular frequencies (between 1 and 3 kHz). These frequencies are in the lower part of the middle range of frequencies that the ear can hear. They contain a large amount of information about the identity of speech sounds. Another result of the experiments was that the processing of sounds by the brain, so called ‘central auditory processing’ appeared to play a more important role for understanding natural speech, while peripheral auditory processing (processing of sounds in the ear) appeared to be more important for synthetic speech.

As a result of the experiments, Maria drew up a list of design guidelines for the next generation of talking computers: make pauses around important words, slow down, and change to simpler forms of expressions (e.g. "the blue pill" is much easier to understand and remember than a complicated medical name). Such simple changes to the robot voices could make an immense difference to the lives of many older people. They will also make services that use computer-generated voices easier for everyone to use. This kind of inclusive design benefits everybody, as it allows people from all walks of life to use the same technology. Maybe Maria’s rules would work for people you know too. Try them out next time grandpa asks you to repeat what you just said!