JollofLab · Est. 2026 · The Gambia

Research toward
language equity in AI,
starting in West Africa.

JollofLab is an open research effort building parallel datasets and translation models for languages the major systems don't serve well — beginning with Mandinka, Wolof, Fula, Joola, Soninke, and Manjago.

01Now

Latest release

What's shipping right now and what's next on the roadmap.

Livejollof-mnk-en · v0.1

Mandinka ↔ English translation, first release

Fine-tuned from NLLB-200 on community-contributed parallel sentences. The first machine-translation model targeting Mandinka publicly available for direct use.

Open the live demo
  • Wolof data collection underway
    reviewers needed
  • Fula corpus alignment notes
    in design
  • Jola & Manjago — community partnerships
    outreach phase
02About

Most of the world's languages are absent from the systems being built right now. We're trying to change that.

The current generation of large language and translation models works well for a few dozen languages. For thousands of others — spoken by hundreds of millions of people — the data needed to train them simply doesn't exist in machine-readable form.

JollofLab builds the missing pieces: parallel sentence datasets, peer-review workflows for quality control, and open-weight translation models. Everything we ship is released under permissive licenses so other researchers can build on it.

Our work focuses first on the languages of Senegambia, where the founding team is rooted. The platform is designed to expand to any low-resource language community willing to contribute.

03The Bantaba AI Initiative

A production-grade multilingual AI
ecosystem for Senegambia.

Bantaba AI is the first effort to assemble a complete artificial-intelligence stack for the languages of the Senegambia region — Mandinka, Wolof, Fula, Joola, Soninke, and Manjago. Together these languages are the daily means of communication for more than twenty million people across The Gambia, Senegal, and Guinea-Bissau.

Despite that reach, they remain almost entirely absent from modern AI systems. As artificial intelligence becomes the primary interface for services, information, and economic opportunity, that absence is no longer just a technical gap — it is a structural barrier to inclusion.

The initiative will deliver an integrated suite of technologies — machine translation, speech recognition, and text-to-speech — supported by large-scale curated datasets and deployed through open platforms and developer APIs.

The foundation is already operational. Our live Mandinka–English translation model demonstrates both technical feasibility and organizational capability. Bantaba AI is a strategic scale-up into regional infrastructure, not a proposal in the abstract.

04Sample

A small sample of model output

Cycles through phrases the Mandinka model has been trained on. Switch target languages to see contributor-provided translations side by side.

sample · live
English · src

Hello, how are you?

Mandinka · tgt

target →
06Languages

Currently in scope

These are the languages with active or planned dataset work. The roster grows as new contributor communities come on board.

mnkMandinkaSenegambia1.3MLive
wolWolofSenegal, Gambia12MCollecting
fulFulaWest Africa40MPlanned
dyoJoolaCasamance500KPlanned
snkSoninkeSenegambia, Mali2MPlanned
mfvManjagoGuinea-Bissau350KPlanned

Language not listed? Propose one →

07Publications

Selected reading

Foundational work informing our approach to low-resource translation. Our own technical reports will be added here as they're published.

Researcher or institution interested in collaborating? contact@jolloflab.com →

08Get involved

The dataset doesn't exist yet.
You can help build it.

Contributors translate sentences in languages they speak fluently. Reviewers validate the work of others. Both roles are open and self-paced — the platform tracks your reputation as your contributions are accepted.