Star Trek universal translation device still far in the future
Nuance Communications's Turner tells Computing about the pros and cons of speech recognition for enterprises
Turner: "The challenge with Star Trek universal translators is achieving an accurate translation from one language to another."
Speech recognition has come a long way since the early days which saw users bark single word answers back at automated systems, often needing several attempts before they were recognised.
So Computing interviewed Ian Turner, general manager for speech and imaging business solutions provider Nuance, to see how the speech recognition market has developed.
Today's problems
Turner began by describing the major problems encountered by today's recognition systems."The main problem is not technology, but poorly designed user interfaces, these often confuse people, don't handle errors well and keep asking users to repeat themselves."
Other major problems according to Turner are excessive background noise and unexpected user behaviour, such as thinking out aloud, or holding a second conversation.
Where is it being used?
As the recognition engine's ability to process speech commands has increased, so have the business areas infiltrated by the technology.
Speech recognition is being used in a number of disparate areas, such as call centres, the healthcare industry, and the automative industry where it is being used in devices like satellite navigation systems.
Perhaps the biggest users of speech recognition technology are call centres, although level of use differs between countries.
For example, 25-30 per cent of US call centres deploy the technology, while the UK uses only five per cent.
Turner said: "I wonder when call-centre managers in the UK will finally realise that they can't keep adding more human call-centre agents? The US twigged about six years ago."
He also said that the UK government is particularly reluctant to automate its centres.
"The government runs some of the UK's biggest call centres and it is obvio usly worried about the unemployment figures because it seem to be tackling this by hiring more [human] agents."
Conversely the private sector is waking up to the technology, but there are still challenges there.
"The people that run call centres are used to dealing with people – and 90 per cent of their time now is spent not doing this [as opposed to using technology]," said Turner.
"I used to run a 250-person call centre at Oracle a long time ago, and it was almost like being an agony aunt – I was constantly dealing with hiring, HR and people's performance-related problems."
The automotive industry is another area that has seen increased penetration of speech recognition. Turner said: "The good thing about cars is that you're in a controlled environment, so the accuracy tends to be pretty good, but you have to be aware of the massive safety issues of using in-car devices."
Nuance has contracts with almost every car maker in the US to deliver, speech and text-to-speech systems. "The two manufacturers we haven't got are Lotus and Porsche, although I'm not sure you'd be able to hear a damn thing in a Lotus!".
How has the technology developed?
Note to self systems: Nuance is probably best known for its Dragon Naturally Speaking system. This allows users to dictate notes into a digital voice recorder, then automatically transcribe the resulting sound files straight into text: "This is done at high nineties fidelity."
Turner said that the 'note to self' systems market is huge, and that Nuance were currently running a trial with the UK Police force using its Dragon Naturally Speaking system.
"Imagine being a police officer - they're out on the beat for a couple of days, sorting out 10 incidents a day. Then they have to spend a day and a half back in the office writing those incidents up. An incident occurring on Monday, and being written up on Thursday afternoon might really test an officer's power of recall."
The healthcare industry also uses Nuance systems. Doctors and radiologists use it as a 'note to self' system, which secretaries can then process with minimal effort and turn into a final polished document. " This means radiologists, gynaecologists and all the other 'ologists', get highly accurate reports very quickly," said Turner.
Semantic processing: "One of the technologies we're trialling in hospitals currently, besides standard speech-to-text, is semantic processing to create medical reports."
"This means it doesn't matter how complicated the description in the sound file, how the patient presented or what their health status was, the technology can put out a decent report, with minimal intervention from the user, " explained Turner.
This technology is being used in a lot of US hospitals as a Software-as-a-Service (SaaS) solution but the UK NHS wasn't keen on having its data located in the US.
The future of speech recognition
So where does Turner see speech recognition in five year's time?
"People will be dictating directly into mobile devices, rather than a dedicated voice recording device," he said.
"I also think most speech recognition in call centres will move into the cloud. Why would you build an infrastructure on-site when you can buy a service that routes your calls through an cloud-based interactive voice recognition system?"
Asked whether systems like Star Trek's universal translation device, which allows humans and aliens to understand each other, would be appearing any time soon, Turner said that the ability to recognise natural speech for dictating emails and documents exists today.
"The challenge for a universal translation device is really around achieving an accurate translation from one language to another, rather than recognising what was said in the original language," he concluded.