We need to talk about voice recognition software

WIRED : To succeed, voice recognition needs to be more emotionally satisfying

WIRED: To succeed, voice recognition needs to be more emotionally satisfying

IN THE wake of Steve Jobs’s death, the two Apple launches that bracketed it have received a relatively muted response.

The day before the announcement of his death, his colleagues at Apple unveiled the iPhone 4S. On Wednesday this week, the company finally released iOS 5, the operating system upgrade for the iPad and iPhone that includes Apple’s new cloud service, iCloud.

Neither is, on its own, revolutionary. It is generally felt that iCloud is Apple’s belated and tentative attempt to fix two of its least admired services – MobileMe and iTunes sync. Press pundits, meanwhile, have expressed disappointment with the lack of new features on the new iPhone (although pre-orders remained high).

READ MORE

Nonetheless, it is worth paying closer attention to both launches, since they indicate the beginnings of two intriguing directions, not just for Apple but for the industry as a whole.

In particular, the single new software addition in the iPhone 4S, called Siri, may herald a new wave of innovation, or at least hype, in an area of user interface previously viewed as rather moribund.

Siri is the iPhone 4S’s speech- recognition service. Apple is pitching it as an intelligent agent, a sort of butler for your phone.

Siri accepts queries such as “what are my meetings for Monday?” and “what is the weather in Dublin today?” spoken in a normal voice into the iPhone’s microphone. It provides detailed results in both visual and spoken form.

The new iPhone can also take dictation, converting speech into text for messaging and e-mails.

Speech recognition is nothing new on smart phones. Android users have been able to enter free-form text and conduct Google searches using spoken commands for years. Even simple phones have been able to recognise speed-dial requests via their microphones for almost a decade. On the desktop, simple voice recognition has been built into the Mac OS and Windows operating systems for a long time.

Far from being anything new, basic speech recognition is almost ubiquitous on computers. You wouldn’t know it though; almost no one uses it unless they have to.

Despite the exciting sci-fi depictions in Star Trek, voice recognition in the real world has developed a terrible reputation, both for accuracy and for any ability to get things done in a reasonable amount of time.

This perception isn’t entirely fair. Text recognition is now surprisingly precise in the most advanced voice-recognition systems. With a decent microphone and a few minutes’ training, sophisticated recognition engines such as Dragon Dictate can interpret and correctly transcribe even casually spoken sentences in noisy environments.

(As an experiment, I’m dictating this entire column into Dragon Dictate. It really is having little trouble understanding what I’m trying to say, though I will confess it is tricky to develop anything but an oddly staccato prose style when you are barking at a computer screen.)

Recognition is no longer the biggest hurdle for free-form voice recognition. The stumbling block remains that nobody has been able to construct a voice user interface that is anything but persistently frustrating. Our expectation that the machine that can understand speech should also be able to make reasonable guesses when it doesn’t quite understand a request makes negotiating with voice interfaces as emotionally satisfying and speedy as giving orders to a manservant with limited English, poor hearing and a bad attitude.

Apple has a profound organisational distaste for ignoring such problems, partly born from Jobs’s notoriously high demands for physically and emotionally satisfying interfaces in his company’s products, and partly from Apple’s own mixed results when attempting to exploit novel data-entry environments.

Apple established itself in the modern computer age thanks to its obsession with refining and perfecting the mouse and windows interface that we now take for granted. It almost lost that reputation when Apple’s first digital assistant, the Newton, was devastated in the marketplace after bad reviews of its free-form handwriting system.

So it seems a little strange that Apple should release Siri with the disclaimer that it is a beta, or trial, service, and limit its availability to iPhone 4S purchasers. One gets the impression that Apple is a little hesitant to put much weight on a still imperfect environment.

If anyone can get voice recognition right, it will be Apple. If anyone is paranoid and obsessive enough to refuse to release a half-baked voice-recognition system, it is Apple. And if anyone can undo years of subtly bad publicity about speech recognition, it is Apple’s almost disturbingly good marketing artillery.

To succeed, voice recognition does not need to be technically better – it needs to be unambiguously more emotionally satisfying. If even Apple, the master of interface and hype, cannot pull this off, we may have to wait another decade until someone else can.