Amazon announced a big push into the AI space at re:Invent 2016, an annual conference hosted by Amazon Web Services (AWS). Three new features form the crux of Amazon’s AI push – Rekognition (image recognition), Polly (text-to-speech), and Lex (conversational technology).
You can now implement these features on any of your apps on websites that leverage AWS. Amazon Polly lets you convert text to speech in any of 47 voices available in 24 languages. Indian English is one of these voices, which is good news if your target audience is Indian.
Why would you need Amazon Polly? Text-to-speech technologies have plenty of uses, starting from reaching visually challenged people or those who have trouble reading to something like public announcements at railway stations or airports.
Creating text-to-speech software has its own challenges. Words such as live, read, and some abbreviations such as St. have multiple pronunciations. Polly expands abbreviations including state names (such as WA), measurement units, and according to Amazon, knows how to pronounce words based on the context of the sentence.
In a blog post, chief evangelist for AWS Jeff Barr writes, “In order to do this, we worked with professional, native speakers of each target language. We asked each speaker to pronounce a myriad of representative words and phrases in their chosen language, and then disassembled the audio into sound units known as diphones.”
Barr also says that Polly can be programmed to understand multiple languages used in the same sentence. It goes without saying that these have to be supported languages.
Amazon Rekognition offers something entirely different – the ability to read images. You can use this to index your photos or categorise them based on the content of these images. Amazon says that you can even use this service to enable a technology like face unlock – authenticate user identity based on their face.
Amazon Lex is powered by the same deep learning tech as the Alexa virtual assistant. If you want to create your own personal assistant on your app, you can use Lex to do that. During re:Invent 2016, Amazon demoed how these three technologies can come together. You could ask the virtual assistant to book flight tickets to some place with lots of lakes and hills. Now the assistant can show you a few photographs of such places and if you select one of these, it can book a flight to the nearest airport.
The key selling point of these features is that if you already use AWS, these features are very easy to add to your app or website. However, none of these features are free and some aren’t available everywhere. Amazon Rekognition lets you analyse 5,000 images and store 1,000 face metadata for free for a year. After that you’ll have to pay up to $1 (roughly Rs. 69) per 1,000 images. This feature is available in US East, US West, and EU regions of AWS.
Amazon Polly lets you convert 5 million characters per month for free for the first year and charges up to $4 (roughly Rs. 280) per 1 million characters processed. It’s available in all AWS regions. Lex costs $0.004 per voice request and $0.00075 per text request.