It is rare for computer people to get excited by domestic appliances. But one of the big attractions at this year's CeBIT trade show in Hanover was Hermine, the talking washing machine.
Hermine understands a few hundred words and even has a basic sense of humour. It squeals in mock horror when told to wash a baby's nappies, and suggests switching to beer when asked to clean red wine stains.
The device is a prototype, built by Siemens and German company Speech Experts. But it demonstrates how speech recognition (SR) presents an attractive answer to the problem of data entry for devices without keyboards, or embedded devices, such as handheld computers, mobile phones or in-car computers.
It is particularly attractive in Asia, where character sets are difficult to cram on to small keyboards. Already more than three million PDAs have been sold in Asia that use speech recognition from ScanSoft, the US-based SR software company, to help people learn languages.
Voice-command systems for cars are also available, allowing drivers to retune the radio, control the air conditioning or make a phone call without taking their hands off the wheel.
However, SR is difficult to deploy on many embedded devices and is also extremely greedy in its consumption of computing power and battery resources. One answer is to cut down the size of the program by reducing the number of functions. Many devices have basic SR built into them, and simple SR software for existing PDAs is widely available.
But these tend to be limited to basic "command and control" opening programs or starting voice calls, rather than full-scale dictation.
ScanSoft is planning to introduce specialised dictation systems for doctors and lawyers, which would recognise only basic vocabulary and the associated technical jargon, fitting the program into the available space by cutting down the number of words it can recognise.
IBM has tried to find a way round the computing power problem by putting dedicated speech accelerators on some of its PowerPC processors for handheld computers. Mr Michael McGinnis, strategic marketing manager at IBM Microelectronics, estimates that hardware speech accelerators can contribute a 20-30 per cent speed improvement and power saving. "But as the software gets more efficient, at some point it will become unnecessary," he says.
Any device that can make a phone call can access SR applications based on remote servers. For example, mobile phone operator Orange has, for some years, been offering voice-powered access to contact details, news and sport through its Wildfire portal.
This approach has many advantages. The technology for server-based SR is more mature and the hardware is much more powerful. Many applications and services are already available.
This is the approach that many technology vendors seem to favour for the moment. Microsoft, for example, is promoting telephony standards such as Salt, rather than building SR into its smart phone and PDA operating systems.
However, a mobile network is not necessarily an ideal environment for doing speech processing, as the call quality can often deteriorate to the point where SR is impossible. Distributed Speech Recognition (DSR) seeks to combine the best of both worlds.
The early stages of the SR process are done by a small piece of software on the device. The information is then sent to the server as digital information on a 2.5G or 3G data network, which is less liable to be corrupted than a 2G voice call.
A DSR trial involving SpeechWorks produced a 15 per cent improvement in recognition accuracy on a call made over a weak signal. "We expect to have handsets with a couple of manufacturers available late this year or early next," says Mr Steve Chambers, SpeechWorks's chief marketing officer.
For DSR to take off, standards are needed to ensure handsets and servers from different suppliers can work together. SpeechWorks is using the Aurora standard, which has been in development for nearly 10 years.
Others seem less enthusiastic, notably Nokia, the handset market leader. "I think these applications are some years off," says Dr Petri Haavisto, vice-president of architecture and roadmapping at Nokia Mobile Software.
Microsoft is also hedging its bets. "We do participate in the development of the Aurora standard," says Mr XD Huang, general manager of Microsoft Speech Technologies. "But we have no official position on it."
Ultimately, as processor speeds get faster and software gets more efficient, it will become cheaper and easier to build devices with full-scale SR, and more and more speech applications will run on the device itself.
Ultimately, the user will probably not know how or where their speech is being processed, as long as they have speech-based applications that are effective and easy to use