Arabic is spoken by more than 450 million people, yet artificial intelligence has never truly understood it. Global models stumble over dialects, flatten nuance and miss cultural context.
That gap is pushing countries across the region to build their own large language models, from the UAE’s Falcon, developed by the Technology Innovation Institute in Abu Dhabi, to Egypt’s Intella, and Saudi Arabia’s recently announced Humain Chat, created by Humain with backing from the Public Investment Fund. Each is a contender a race to ensure the future of AI reflects Arab voices and identities.
Humain Chat, launched last month and currently accessible to Saudi users in beta mode, is the kingdom’s first home-grown Arabic LLM.
Developed with support from the Saudi sovereign wealth fund PIF, it is positioned as a secure, Arabic-first alternative to global systems, aimed at sectors such as government, education and business services.
The platform will be rolled out across the world in phases, according to the company.
A language AI has never mastered
Arabic language functions differently to more uniform languages such as English. Nour Al Hassan, founder of Arabic.ai, explained that Arabic isn’t just one language; it’s a family of dialects layered over a deep, classical base. This means each dialect could have a different word to express the same thing.
“The morphology is complex: one root can produce dozens of forms, and words often bundle multiple meanings into a single token,” she told The National. She added that this complexity is compounded by “diversity of dialects, code switching between English, French and Arabizi audiences, En, and the lack of standardised spelling”.
For example, the word “بس” or “bas” can mean “only” in Egypt, “but” in the Levant, or “enough” in the Gulf, differences that can completely change a sentence. Arabizi, meanwhile, is the informal practice of writing Arabic with Latin letters and numbers, such as “3” for ع or “7” for ح, which adds another layer of inconsistency for AI systems to process.
For AI to truly understand Arabic, Ms Al Hassan said, it must learn “the rhythm and nuance of how people actually speak and write across the region, not just formal Arabic in textbooks”. That challenge is what Egypt’s Intella was founded to address.
Chief executive Nour Taher told The National that Arabic’s difficulty for AI “isn’t just its complexity, but its duality”. She explained that Arabic takes many forms: the formal, written Modern Standard Arabic, and then the way people actually speak, which she described as “a rich, diverse spectrum of dialects”.
Listen more about different LLMs:
Most global models fail, she explained, because they rely on labelled data sets, which she said “don't exist in the case of dialectal Arabic”. Instead, Intella spent 18 months building one of the most diverse data sets in the world, curated and annotated by native speakers. Its conversational agent Ziila is already being used in banks, telecoms and government services.
Ms Taher said the company focuses on the application layer, building industry-specific small language models or fine-tuning existing ones, with a particular strength in its proprietary dialectal text-to-speech and speech-to-text engines. “We win by being the most accurate and effective solution for specific business problems, not by trying to be a generalist tool,” she said.
Missing ingredient: real-life Arabic data
If language complexity is the first barrier, data scarcity is the second. Ms Al Hassan called it “the single biggest bottleneck”. The problem, she said, is not just volume. “It’s about quality, balance and rights,” she explained.
“Too much of our Arabic data is either scraped news or religious text. What’s missing are everyday conversations, dialect-rich speech, and domain-specific content.”
She argued that progress depends on sovereign rights, cleared data sets and large-scale Arabic preference training with native raters, people who are proficient in a language and are tasked with evaluating, or rating, language use. “That’s how we close the gap between models that can translate and models that can actually reason and engage in Arabic,” Ms Al Hassan said.
AI as sovereignty and strategy
In the UAE, the motivation for developing Falcon goes beyond language. Dr Hakim Hacid, chief Researcher at the Artificial Intelligence and Digital Science Research Centre at the Technology Innovation Institute, said open sourcing Falcon was a deliberate choice “to accelerate innovation, build trust and ensure broad accessibility”.

“We didn’t open source because we had to,” he added. “We did it because it works – technically, strategically and ethically,” he told The National. Falcon Arabic was trained on high-quality native Arabic data, covering both Modern Standard Arabic and regional dialects.
Dr Hacid said this allowed the model “to capture not only the structure of the language but also the nuance, tone, and cultural context that are often missing in generic multilingual models”. Ensuring AI reflects the richness of Arabic, he added, is “not just a technical goal, it is essential for inclusion and cultural relevance”.
On the UAE’s push for AI sovereignty, Dr Hacid explained that it isn't just about building models. “It involves having visibility into and ownership over the entire stack: data, infrastructure, algorithm, training and deployment,” he said.
Falcon, he said, gave the UAE hands-on experience in building a high-performance model from the ground up. “Falcon shows that this region can lead technically and contribute meaningfully to the global AI ecosystem,” he said.
While Falcon has performed strongly on global benchmarks, Dr Hacid said the priority is real-world application. “Our focus is on building models that are not only globally competitive, but also efficient, adaptable, and relevant to real-world use,” he said.
He added that if a model performs well in a lab but cannot be deployed responsibly or efficiently, “then it does not serve its purpose”.
Billions fuelling the Arabic AI race
The push is also being driven by money. Prosus Ventures, which recently led a $12.5 million Series A round in Intella, sees Arabic AI as a major opportunity. Robin Voogd, head of Middle East investments at the firm, said Arabic is the fifth-most spoken language in the world, yet Arabic AI models “severely underperform, particularly across dialects”.
This, he said, creates both “a huge gap and a major opportunity: whoever builds the best models for Arabic will gain a strategic data advantage in a massive underserved market”, he told The National.
Fadi Ghandour, executive chairman of the investment company Wamda, said investor appetite is immense.

“Sovereign wealth funds and government-backed entities have already committed billions to AI infrastructure, particularly in the UAE and Saudi Arabia,” he told The National. “These investments include large-scale data centres and strategic partnerships with companies like Nvidia, because without computer power, AI doesn’t happen.”
The business stakes are clear. According to Grand View Research's January 2024 report, the Mena AI market was valued at $11.9 billion in 2023 and is projected to reach $166.3 billion by 2030, growing at nearly 45 per cent annually.
In the UAE alone, the market is expected to grow from $3.5 billion in 2023 to $46.3 billion by 2030, according to a February report by Trends Research & Advisory, an independent research institution. Most of the momentum is in the Gulf, while the Levant plays a quieter role.
Mr Ghandour described Jordan and Lebanon as important sources of talent. “Jordan and Lebanon have exceptional AI engineers and data scientists, many of whom are already contributing to Arabic LLMs,” he said.
He noted that many are being recruited into Gulf companies or working in hubs in Amman and Irbid. This reflects how the Levant supports the growth of Arabic AI indirectly, even if the flagship projects have their headquarters elsewhere.
Real or hyped?
As with any emerging technology, the risk of hype is ever-present. Mr Ghandour acknowledged it, but said the region was at a turning point. “There’s always hype with new technology. But hype fades – and the serious players remain,” he said.
Ms Al Hassan stressed that Arabic LLMs are not hype if they are built on the right foundations. “They’re only as strong as the data and fine-tuning behind them,” she said.
Without curated corpora and alignment with cultural nuance, she warned, “Arabic LLMs risk being generic imitations.” But with the right investment in data and real use cases, “they become genuine breakthroughs”.
Ms Taher at Intella agreed that enterprises were already pushing beyond experimentation. She said her client “is leapfrogging the chatbot phase and moving directly to sophisticated conversational intelligence. This demonstrates a clear, top-down mandate to use AI as a core pillar of business strategy.”
The rise of Arabic LLMs is not just about catching up with Silicon Valley. It is about cultural relevance, digital sovereignty and economic opportunity.
Falcon, Intella and Humain each represent different answers to the same question: why should the region depend on others to build its technological future?
As Mr Ghandour put it, Arabic-focused LLMs are “not just about language – they’re about identity. The age of one-size-fits-all tech is behind us.”