Software review: Dragon Professional Individual v15
Speech recognition has come a long way in the past 20 years, but is it now ready to replace the keyboard?
Back in the late 1990s/early 2000s, there was a lot of excitement about speech recognition. But while the technology largely worked, it wasn't quite right.
You. Had. To. Speak. In. A. Staccato. Delete that. Stacatto. DELETE THAT. STACK-ART-OH. Style. Of. Voice, which was an acquired art and quite annoying.
And, on top of that, the number of corrections you often needed to make - especially if what you wanted to transcribe wasn't straightforward - meant it wasn't necessarily the time-saver it should have been.
Oh, and they needed a fair bit of training, too.
Now, though, that training period has been slashed - especially if you have a common accent - and dictation is also more straightforward.
But is it yet good enough?
I have been using v12 of Nuance's Dragon speech recognition software regularly for the past three years, mostly for transcribing interviews (listen, pause, recite) and occasionally for writing articles. I've found it to be helpful, mostly, barring a number of frustrations around its recognition of certain words in the IT lexicon such as Azure and Hadoop which I could never get it to learn.
2013, when v12 came out, is a lifetime ago in terms of machine learning (ML) capabilities, which is what Nuance now brings to bear with newer versions of Dragon. The vendor claims that the Deep Learning capabilities installed locally with the software are capable of improving accuracy by up to 24 per cent. I was keen to see whether this claim would match the reality.
Works out of the box
The first thing to note that it no longer requires training. Dragons of yore required an hour or two of reciting standard texts in order for them to recognise the user's voice accurately, but the latest version is usable straight out of the box, although it will get better over time as it learns.
It may be quick to get started, but time learning one's way around its menus and options is time well spent. There are various help pages available, among most useful being the "Improve my Accuracy" section. This includes a facility for adding new words and phrases to your profile's vocabulary. You can also import documents that you've written directly allowing it to learn more about your writing style and idiosyncrasies.
Version 15 seems quicker and nimbler than its predecessor. My PC is a Core-i7 Windows 10 machine with a decent amount of RAM (16 GB) but there were times when v12 was distinctly laggy, something I did not notice with the newer edition.
One new feature is the handy tips that pop up when you do something for the first time, which are genuinely helpful given the large number of possible commands available. If you're still unsure what to do you can ask "what can I say?" causing a contextual menu to be displayed. The learning curve for Dragon is long and shallow rather than steep, but I found myself referring to the menus frequently.
Audio transcriptions
There is a facility in this version for transcribing direct from audio files. This worked flawlessly with an mp3 of my own voice recorded on a Sony voice recorder. However the transcription of an interview recorded using the same recorder in a noisy restaurant resulted in a stream of gibberish, so you need to make sure the audio is of good quality. Even a one-to-one interview in quiet conditions was strewn with errors. If Nuance could direct its ML algorithms to filter out background noise and more accurately processing third-party voices that would be a huge step forward for transcribing interviews and meetings.
A useful feature for those who use a lot of templates is 'Add New Auto-Text' which allows voice-activated insertion of standard forms, images, signatures, boilerplate and the like into a document with a single command (e.g. "insert signature"). Forms can be navigated easily too, the user inputting a value and then saying "next field" to move on. For repetitive tasks this can really save a lot of time.
This version is much better at formatting numbers and dates, something that previously required frequent keyboard intervention to put right, so "eight pounds fifty" is output as "£8.50".
The interface is improved too and is easier to navigate than before, although at times it still feels a bit bolted together. Ironically it is not at all easy to navigate through the help pages by voice alone.
Machine learning
The ML capabilities really do seem to make a difference. The 'Accuracy Tuning' feature took about an hour to crunch through my profile, adapting to my common tics and corrections. Afterwards I even badly pronounced or mumbled words were generally picked up correctly by Dragon.
The on-the-fly training capabilities were less impressive. When it misspells a non-standard word such as Hadoop (which Dragon always interprets as 'had OOP'), the command "spell that" brings up the training dialogue by which the user should be able to teach Dragon the correct spelling. But no matter how many times I've tried to teach it, Dragon never seems to learn and v15 seems no better than v12 in this respect. Using the 'Add a New Word or Phrase' dialogue is the best way around this.
While I use Dragon mostly for transcription, it can do a lot more than that. The purpose of this review I tried to use it to navigate around my PC, generally successfully although it takes some time to get used to it. It is integrated with commonly used software such as Office and Google Docs.
The browser plug-in which allows integration with Chrome and Edge (there is no plug-in for Firefox) was less successful though. I didn't test Edge but the Chrome plug-in frequently caused the browser to freeze or crash. A glance at user reviews showed that I was not alone in having these problems.
There is a separate mobile app - Dragon Anywhere - if you're willing to fork out $15 a month. This seems very steep in view of the £279.99 asking price for this version. Nuance should consider bundling it with higher end versions of Dragon, particularly in view of the number of very serviceable free speech recognition apps available for Android and iOS.
Professional Individual is aimed at the self-employed professional, small office workers and business users who need standard forms, type a lot of emails and letters and use the many Windows or Mac-based applications with which it is integrated - for which it is well suited. Speed and accuracy have improved noticeably over the past few years and a number of refinements have boosted ease of use.
But while it takes much less training than it used to, the user still needs to put in a few hours to familiarise him or herself with the many commands and options available to get the most out of the application. For those who do a lot of writing and reporting, especially for ‘boxing glove typists' like me this is time well spent. Professional speech recognition technology can really save a lot of time and effort.