The advent of large language model AI platforms like Chat-GPT has ushered in a new era of AI-powered interactions. While there has been a surge in interest and development of AI technology in the Middle East, Arabic-language models have often lagged behind. However, a collaborative effort between Abu Dhabi’s Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Silicon Valley-based Cerebras Systems, and UAE-based AI company G42 has unveiled a groundbreaking AI tool tailored for Arabic speakers, called “Jais.” This development not only addresses the needs of Arabic speakers but also holds the potential to advance large language models in other underrepresented languages within the AI landscape.
The challenge of Arabic language models
Although existing language models like ChatGPT and Meta’s LLaMA have some Arabic language capabilities, they were predominantly trained on English data from the internet. According to Timothy Baldwin, acting provost and professor of natural language processing at MBZUAI, Jais took a different approach. It utilized a combination of English and Arabic datasets, with a strong emphasis on content from the Middle East. This unique training approach has enabled Jais to achieve a level of understanding and proficiency in Arabic that sets it apart from its counterparts.
The dominance of Latin-alphabet languages
The dominance of languages that use the Latin alphabet, with English at the forefront, on the internet has led to the creation of larger datasets for these languages. Mohammed Soliman, director of strategic technologies and the cybersecurity program at the Middle East Institute, highlights that exclusive access to AI tools for specific languages could disadvantage various sections of society. Language models trained primarily in English often lack cultural awareness and understanding of diverse backgrounds, which can adversely affect user experiences.
Cultural nuances and dialects in Arabic
Arabic, being the sixth most spoken language globally, presents a unique challenge due to its rich diversity of dialects. Modern Standard Arabic is typically used for official documents and formal writing, while local dialects are prevalent in blogs and social media. Jais, with its diverse training data, has the ability to navigate between these dialects and understand cultural nuances, making it more versatile and applicable across different industries.
Expanding the possibilities
As Jais continues to evolve, the development team is looking to expand its capabilities beyond text-based interactions. They plan to incorporate the ability to work with images, graphs, or tabular data, opening up possibilities for applications in interpreting medical scans, analyzing investment data, or processing satellite data.
Responsibility in AI development
Jais, like other generative AI models, implements instruction tuning to prevent the generation of harmful or toxic content. It adheres to local rules and customs, ensuring that responses are in line with ethical and cultural norms. The development process of Jais involved dialogue with the UAE government and other institutions to ensure responsible AI deployment.
Regional developments in the UAE
The United Arab Emirates has been at the forefront of developing generative AI systems. In 2017, it became the first country in the world to appoint a Minister of AI. Notably, the region’s largest generative AI model, Falcon, was unveiled by Abu Dhabi’s Advanced Technology Research Council and the Technology Innovation Institute (TII). Although Falcon is currently only available in English, it boasts 180 billion parameters and surpasses competitors in reasoning, coding, and knowledge tests. Both Falcon and Jais are open-source, making their code accessible for anyone to utilize or modify.
AI’s potential impact on the Middle East
According to a 2018 report by PwC, the Middle East stands to gain up to $320 billion in benefits from AI by 2030. The region is keen on building its capabilities in the AI domain to harness the full potential of this technology. Ali Hosseini, PwC’s Middle East chief digital officer, highlights that some of the best open-source AI models have been developed in the region, citing Falcon and Jais as prime examples.
The introduction of Jais, a tailored AI tool for Arabic speakers, is a significant step forward in addressing the language gap within the AI landscape. It not only serves the needs of Arabic speakers but also sets a precedent for the development of language models for underrepresented languages worldwide. With its ability to understand dialects and cultural nuances, Jais is poised to have a profound impact across various industries, furthering the evolution of AI in the Middle East and beyond. As AI technology continues to advance, inclusivity in language models is crucial to ensure that the benefits of AI are accessible to diverse populations around the world. Jais represents a promising step in that direction, with the potential for more groundbreaking developments on the horizon.