Voice Search in Underrepresented Languages
10th November 2010
Google, mirabile dictu, actually says what they mean by ‘underrepresented’.
We define underrepresented languages as those which, while spoken by millions, have little presence in electronic and physical media, e.g., webpages, newspapers and magazines. Underrepresented languages have also often received little attention from the speech research community. Their phonetics, grammar, acoustics, etc., haven’t been extensively studied, making the development of ASR (automatic speech recognition) voice search systems challenging.
We believe that the speech research community needs to start working on many of these underrepresented languages to advance progress and build speech recognition, translation and other Natural Language Processing (NLP) technologies. The development of NLP technologies in these languages is critical for enabling information access for everybody. Indeed, these technologies have the potential to break language barriers.
This is a fairly pure example of ‘business that thinks it’s more than a business’, an outgrowth of the Crustian ‘social responsibility’ kick. Such businesses aren’t content to just make some money for their investors; they have this grandiose notion that they exist ‘to be a force for good’. How well this works out in practice is left as an exercise for the reader.