[ad_1]
Artificial intelligence (AI) technology has several use cases and one of them is to provide access to digital services in their native languages. In a country as vast as India, where people speak over 121 languages, it is a tough task to make digital services available to them in their native languages.
The government is building language datasets through Bhashini, an AI-led language translation system that is creating open source datasets in local languages for building AI tools which in return aims to deliver more services digitally.
AI’s role in bringing languages online
Notably, only a few of these 121 languages are covered by natural language processing (NLP), the branch of artificial intelligence that enables computers to understand text and spoken words. This means that hundreds of millions of Indians are excluded from useful information.
“For AI tools to work for everyone, they need to also cater to people who don’t speak English or French or Spanish,” news agency Reuters quoted Kalika Bali, principal researcher at Microsoft Research India, as saying.
“But if we had to collect as much data in Indian languages as went into a large language model like GPT, we’d be waiting another 10 years. So what we can do is create layers on top of generative AI models such as ChatGPT or Llama,” Bali said.
How AI models are trained
AI models are trained on certain datasets such as written texts. However, several Indian languages mainly have an oral tradition, which means that textual records are not plentiful, making it difficult to collect data in less common languages.
In comes Bhashini, which includes a crowdsourcing initiative for people to contribute sentences in various languages, validate audio or text transcribed by others, translate texts and label images.
“The government is pushing very strongly to create datasets to train large language models in Indian languages, and these are already in use in translation tools for education, tourism and in the courts,” Pushpak Bhattacharyya, head of the Computation for Indian Language Technology Lab in Mumbai, was quoted as saying.
Meta’s SeamlessM4T model
Earlier this year, Meta CEO Mark Zuckerberg announced an AI-powered speech translation model that can translate and transcribe speech in up to 100 languages. Zuckerberg said that the AI model can do speech-to-text, text-to-speech, speech-to-speech, text-to-text translation and speech recognition.
The model can be useful to communicate and understand information in languages that people don’t know, especially those languages that don’t have a widely used writing system or there are no texts left to train AI models.
The government is building language datasets through Bhashini, an AI-led language translation system that is creating open source datasets in local languages for building AI tools which in return aims to deliver more services digitally.
AI’s role in bringing languages online
Notably, only a few of these 121 languages are covered by natural language processing (NLP), the branch of artificial intelligence that enables computers to understand text and spoken words. This means that hundreds of millions of Indians are excluded from useful information.
“For AI tools to work for everyone, they need to also cater to people who don’t speak English or French or Spanish,” news agency Reuters quoted Kalika Bali, principal researcher at Microsoft Research India, as saying.
“But if we had to collect as much data in Indian languages as went into a large language model like GPT, we’d be waiting another 10 years. So what we can do is create layers on top of generative AI models such as ChatGPT or Llama,” Bali said.
How AI models are trained
AI models are trained on certain datasets such as written texts. However, several Indian languages mainly have an oral tradition, which means that textual records are not plentiful, making it difficult to collect data in less common languages.
In comes Bhashini, which includes a crowdsourcing initiative for people to contribute sentences in various languages, validate audio or text transcribed by others, translate texts and label images.
“The government is pushing very strongly to create datasets to train large language models in Indian languages, and these are already in use in translation tools for education, tourism and in the courts,” Pushpak Bhattacharyya, head of the Computation for Indian Language Technology Lab in Mumbai, was quoted as saying.
Meta’s SeamlessM4T model
Earlier this year, Meta CEO Mark Zuckerberg announced an AI-powered speech translation model that can translate and transcribe speech in up to 100 languages. Zuckerberg said that the AI model can do speech-to-text, text-to-speech, speech-to-speech, text-to-text translation and speech recognition.
The model can be useful to communicate and understand information in languages that people don’t know, especially those languages that don’t have a widely used writing system or there are no texts left to train AI models.
[ad_2]
Source link
More Stories
Google Maps: Three privacy features coming to Google Maps on Android, iPhones
Most-Downloaded IPhone App: This Chinese app was the most-downloaded iPhone app in the US in 2023
Ukraine’s largest mobile operator goes offline for millions of users after cyber attack