Machine learning has come a long way in last few years and it has now been integrated into our lives in ways that make us hardly recognise its presence. In a development that marks a huge milestone for this aspect of technology, researchers at Microsoft have claimed that their speech recognition system has reached ‘human parity’.
This effectively means that the speech recognition system can recognise words from a conversation as well as humans would do. It should be noted that ‘human parity’ doesn’t mean error-free recognition as even professional transcriptionists make errors and don’t recognise every word perfectly from conversations. This term essentially implies that the rate of errors made by the recognition system is now at par with the error rate of humans while performing a similar tasks.
During their tests, researchers at Microsoft reported a word error rate of 5.9 percent by the speech recognition system. “The 5.9 percent error rate is about equal to that of people who were asked to transcribe the same conversation, and it’s the lowest ever recorded against the industry standard Switchboard speech recognition task,” the Redmond-based company said in its official blog post.
“We’ve reached human parity,” company’s Chief Speech Scientist Xuedong Huang was quoted as saying in the blog.
This advancement in speech recognition can have wide implications for both business-based products as well as consumer products. This advancement can improve the user experience on a simple entertainment console like Xbox or can be used in speech-to-text transcription software. Microsoft’s digital assistant Cortana is also likely to benefit from the improvement in speech recognition.
“This will make Cortana more powerful, making a truly intelligent assistant possible,” company’s Head of Artificial Intelligence Harry Shum said.
In order to improve speech recognition to this level, the researchers made use of neural language models, which puts similar words together for better recognition.
Even though Microsoft’s speech recognition system has achieved a huge milestone by reducing the error rate significantly, the researchers still have a long way to go in order to make the technology useable in real-life situations. This will include situations where there is noise in background or when multiple people are talking and for devices like our phones to better understand the context of the conversation, like humans can do.