header logo
Science & Tech
AI researchers achieve landmark breakthrough in Sinhala AI with publication in prestigious IEEE Access journal
May 14, 202612:35 PM
AI researchers achieve landmark breakthrough in Sinhala AI with publication in prestigious IEEE Access journal

While global tech giants spend billions training artificial intelligence that still struggles in Sinhala, a Sri Lankan research team has quietly built a model that does it properly — using just two GPUs.

 

In a blind evaluation, the new Sinhala language model scored 4.5 out of 5, compared with just 1 out of 5 for the base Meta Llama 3.1 model on the same Sinhala prompts. The team also cut the model’s perplexity — a standard measure of how well an AI understands a language — by close to 90 percent, according to a statement. 

 

In practical terms, the model can hold a natural conversation in Sinhala, answer questions, follow instructions, and stay coherent across long responses.

 

The research has been accepted for publication in IEEE Access, a peer-reviewed open-access journal of the Institute of Electrical and Electronics Engineers (IEEE), with a Journal Impact Factor of 3.6 in the 2024 Clarivate Journal Citation Reports and an h5-index above 200 on Google Scholar Metrics. 

 

IEEE Access operates a binary review policy — reviewers either accept or reject a manuscript in the form submitted, with no revision cycle — a quality bar few low-resource-language AI papers have cleared.

 

Why this matters for Sri Lanka

 

Sinhala is spoken by more than 20 million people, but it is barely represented in the training data of the AI systems everyone is now talking about. Ask ChatGPT, Claude, or Gemini something in Sinhala and the answers sometimes tend to be broken, repetitive, or nonsense.

 

The deeper issue is sovereignty. Even when foreign AI tools do work in Sinhala, Sri Lanka has no control over them — the model weights, the training data, the safety rules, and ultimately the off-switch all sit with companies in the United States or China, the statement said. 

 

For a country where most government, healthcare, and education conversations happen in Sinhala, depending entirely on AI built and operated abroad is a structural risk over data privacy, national security, cultural framing, and basic continuity of service when foreign policy, pricing, or licensing shifts.

 

A sovereign Sinhala LLM changes that equation. It can be hosted locally, audited locally, fine-tuned for Sri Lankan contexts, and continue to operate regardless of what any foreign tech company decides next — opening the door to Sinhala-speaking AI assistants for government services, educational tools for Sinhala-medium students, healthcare information for elderly and rural users, accessibility tools for citizens who do not speak English, and natural-sounding customer service for local businesses.

 

Built on a tight budget

 

Major AI labs in the United States use thousands of GPUs and spend hundreds of millions of dollars to train comparable systems. This team did it with two GPUs over a few weeks of training, and had to build its datasets from scratch because no large, clean Sinhala corpus existed. 

 

The team scraped Sinhala news sites, books, and online sources, and used Hindi datasets as a starting point — Hindi and Sinhala share Indo-Aryan roots — to build a final dataset of around 3.6 million question-answer pairs and 4 billion tokens, one of the largest public Sinhala AI datasets, now freely available on Hugging Face, the statement said.

 

The team also redesigned how the model reads Sinhala. The original Llama tokenizer needed an average of 91 tokens per Sinhala sentence and failed on 97.5 percent of Sinhala characters at the byte level. After adding around 35,000 Sinhala-specific tokens, that dropped to 23 tokens per sentence and zero byte-level failures.

 

Who built it

 

The project was conducted at the Department of Electrical Engineering, University of Moratuwa, led by Sanjeewa Alwis, CEO of Decryptogen; Dr. Chathura Wanigasekara (Senior Member, IEEE) of the Institute of Maritime Technologies and Propulsion Systems at the German Aerospace Centre (DLR), Geesthacht; and Dr. Logeeshan Velmanickam (Member, IEEE), Senior Lecturer at the Department. Dr. Wanigasekara and Dr. Logeeshan are the corresponding authors.

 

The core engineering work — model training, dataset construction, tokenizer redesign, and evaluation — was carried out by P. K. Udith I. Sandaruwan, Nimesh M. A. Fonseka, and Pamith C. Salwathura (Student Member, IEEE), all University of Moratuwa graduates, working in collaboration with the Decryptogen R&D team. They earlier presented a preliminary version of the work at the IEEE AIIoT Congress in Seattle in 2025; the IEEE Access paper is the full, finalized version.

 

Sanjeewa Alwis has lead  Decryptogen into an international operation across Europe, the United States, and Australia, with a focus on decentralized large language model training and blockchain-integrated AI. He has long argued that emerging regions need to build their own sovereign AI capacity rather than wait for foreign tech companies to include them, it added.

 

What comes next

 

Next steps include longer training runs, larger and more diverse Sinhala datasets, and deployments in assistive technologies and conversational systems for Sinhala speakers. The full paper, “End-to-End Adaptation of LLMs for Low-Resource Languages,” will appear in IEEE Access under DOI 10.1109/ACCESS.2026.3693119. The datasets are publicly available on Hugging Face.

 

 

 

MostRead
VideoStories
“Beyond Mahapola” network inaugurated; Experts highlight void in skills to fill current job market

“Beyond Mahapola” network inaugurated; Experts highlight void in skills to fill current job market

Low pressure area weakening and moving away from SL; Rains, strong winds to continue tomorrow

Low pressure area weakening and moving away from SL; Rains, strong winds to continue tomorrow

Excess payment of Rs.656M on remittances; People's Bank concedes error in exchange rate application

Excess payment of Rs.656M on remittances; People's Bank concedes error in exchange rate application

River water levels receding amid dwindling rainfall in Sri Lanka

River water levels receding amid dwindling rainfall in Sri Lanka

CCD obtained 90-day detention order to further interrogate ‘Gampaha Ousmand’

CCD obtained 90-day detention order to further interrogate ‘Gampaha Ousmand’

Several injured after train derails between Kelaniya and Wanawasala

Several injured after train derails between Kelaniya and Wanawasala

Govt imposes 50% surcharge on imported vehicles for three months

Govt imposes 50% surcharge on imported vehicles for three months

Sri Lankan Rupee depreciating at unprecedented speed, warns MP Dilith Jayaweera

Sri Lankan Rupee depreciating at unprecedented speed, warns MP Dilith Jayaweera

‘Sancharaka Udawa 2026’ - Travel & Tourism Fair inaugurated under the patronage of PM Harini

‘Sancharaka Udawa 2026’ - Travel & Tourism Fair inaugurated under the patronage of PM Harini

Heavy rainfall above 150mm recorded in several provinces within yesterday

Heavy rainfall above 150mm recorded in several provinces within yesterday

President reviews post-Ditwah recovery in Badulla; Land, housing issues deliberated with officials

President reviews post-Ditwah recovery in Badulla; Land, housing issues deliberated with officials

Adverse weather wreak havoc across Sri Lanka; Heavy rains to continue, over 1,000 families affected

Adverse weather wreak havoc across Sri Lanka; Heavy rains to continue, over 1,000 families affected

Kapila Chandrasena Death Inquiry: No footage found from CCTV system at Aravinda de Silva’s residence

Kapila Chandrasena Death Inquiry: No footage found from CCTV system at Aravinda de Silva’s residence

Sri Lanka’s fuel import costs surge nearly six-fold amid ongoing Middle East conflict - President

Sri Lanka’s fuel import costs surge nearly six-fold amid ongoing Middle East conflict - President

Low-pressure system northeast of Sri Lanka to weaken within 48 hours - Met. Dept.

Low-pressure system northeast of Sri Lanka to weaken within 48 hours - Met. Dept.