Room for AI improvement in Arabic, Chinese and other languages, MBZUAI study finds

Despite a proliferation of multilingual AI language models in recent years, their learning abilities are still skewed to perform better in English, a new study shows.

The research, spearheaded by the UAE's Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) and other organisations, focused on the cross-lingual abilities of the models.

In the context of AI, cross-lingual abilities are how well a language model can use knowledge gained in one language to answer questions in another.

Research data shows the evaluation of 10 models on 6,600 bilingual pairs across 16 languages. MBZUAI

“The gaps between English and many other languages are quite big,” said Zixiang Xu, a visiting student at MBZUAI.

Among the models tested for the study were Alibaba's Qwen, Anthropic's Claude Sonnet, Google's Gemma, Meta's Llama and OpenAI's ChatGPT.

Researchers at Abu Dhabi's AI university said Claude Sonnet and ChatGPT scored the highest marks for cross-lingual abilities, but stressed there was still room for improvement.

“Models had lower performance in low-resource languages like Amharic, Arabic and Yoruba on questions about science and technology,” the report stated.

Bengali, Chinese, French, German, Hebrew, Hindi, Italian, Japanese and Korean were some of the languages studied for cross-lingual AI performance.

The results, according to MBZUAI, showed that despite certain languages having significant influence, there was still a performance drop compared to English.

“Even on Chinese, a language for which there is a huge amount of training data, model accuracy dropped by an average of nearly 60 per cent,” the report said.

Since the debut of OpenAI's ChatGPT in 2022, concern and criticism have been plentiful about AI tools' potential bias towards English which could skew answers or leave other languages and cultures by the wayside.

Some languages, such as Arabic, initially posed a challenge to AI developers and researchers because of the diversified dialects and nuances, but methodical progress has since been made.

Even on Chinese, for which there is substantial training data, model accuracy dropped by nearly 60 per cent. MBZUAI

In 2024, a UAE-developed language model, Jais 70B, boasted the ability to deliver Arabic-English bilingual capabilities at an “unprecedented size and scale”. It contains what was described as the largest Arabic data set “ever used to train an open-source foundational model”.

MBZUAI said the research on AI and language performance was done with the hope of eventually creating a better method for exposing the “cross-lingual weaknesses” of AI models.

Researchers said fine-tuning models could efficiently increase performance and accuracy.

The university described the findings as “an important step towards building language models that are useful to people who speak a wide array of languages”.

The study was recently presented at the 63rd Annual Meeting of the Association for Computational Linguistics in Vienna, Austria.

The researchers and analysts released their data so others can work to improve AI models' cross-lingual capabilities.

MBZUAI, one of the world's first universities dedicated to AI, opened in 2020 and has since increased its offerings and partnerships.