Our Story
From automation obsession to sovereign Moroccan AI.
One person. Massive data pipelines. Custom models. This is the timeline of how Sawalni — the first large language model built for Moroccan and North African languages — came into existence.
The Seeds (2000s–2022)
Childhood
First A* Algorithm
Omar implements his first pathfinding algorithm. Revelation: automated decision-making emerging from a few lines of code.
2012–2017
Early NLP Experiments
Prompted by the voice assistant craze — and the awkwardness of switching to English just to talk to Cortana — Omar experiments with SpaCy for Moroccan Arabic. The time was not right.
November 2022
The ChatGPT Moment
ChatGPT launches. Omar spends a week sleeping very little, exploring its understanding of Moroccan Arabic. Projects scenarios where non-English speakers risk being left behind.
"LLMs need data. Lots of it. And I knew how to get data" — years of RPA and web scraping become a superpower.
Data & Language ID (Jan–Jun 2023)
Early 2023
The Data Pipeline
Implements massive data-gathering using Monitoro. Quickly discovers: filtering Moroccan Arabic from the web requires its own AI.
5B+
tokens collected
Spring 2023
Gherbal — "The Sieve"
Creates a language identification model for ~50 languages. Beats SOTA for Moroccan Arabic. Later acknowledged in HuggingFace's Fineweb2 paper.
First application of the Crescendo method: small seed → bootstrap → iterate → scale. Each model enables the next.
2023
Sawtone & Daktilo
Tackles the Darija transliteration problem: no two people write the same word the same way. Builds phonetic embeddings (Sawtone) and an LLM-based transliterator (Daktilo).
2023
Tarjamli — Translation Pipeline
Builds a complete translate → score → transliterate pipeline using NLLB-200 as seed. Creates instruction data at scale for the first time.
The First Moroccan LLM (July 2023)
July 2023
Sawalni v1 — First in History
Only 8 months after ChatGPT, the first Moroccan LLM is born. Extremely basic — awkward conversations, funny recipe inventions — but unmistakably alive.
8 months
from ChatGPT to first Moroccan LLM
July 2023
Second Demo
A second early demo showing Sawalni responding in Darija Arabizi.
Public Momentum (2024)
Early 2024
Tarjamli.ma Launch
The first translation app for Moroccans — matching Google Translate UX while supporting Darija in Arabizi for the first time.
Spring 2024
Academic Circuit
Presents at the International Conference of Moroccan Arabic (University of Navarra, Spain). Mentors at an AI hackathon at 1337 coding school.
Summer 2024
National TV Coverage
Sawalni v2 featured on Moroccan national television. Momentum building, but still far from a shipping product.
The Quality Leap (2024–2025)
2024–2025
Sawalni v4 — The Personality Model
First version to supersede Tarjamli and Daktilo. Testers felt attachment to its personality. But poor tool-calling and inconsistent instruction following demand a new approach.
2025
Custom Tokenizer & Knowledge Distillation
Technical breakthroughs: instruction residuals enable affordable pretraining, custom tokenizer delivers huge quality boost.
"LLMs are simply big balls of math. The same algebra I studied in secondary school, except now vectors map into something almost palpable."
2025
Wikilangs
Sponsored by Featherless, Wikilangs bootstraps AI basics for 300+ languages — so future Sawalni-like projects have a head start.
300+
languages via Wikilangs
The Current Era (2026)
2026
Sawalni v5/v6 — Sovereign, Agentic, Multilingual
Full language support across Darija, Hassaniya, Tachelhit, Tarifit, Central Atlas Tamazight, MSA, French, English, Spanish and more. Agentic with 200+ tool types. Live at sawalni.com.
11+
language varieties supported
2026
Published Research
Paper on phonetic embeddings published (doi: 10.14746/linpo.2025.67.1.8). Gherbal work presented at TIM'24, University Hassan II.
The Crescendo Method
Each model enables the next. Small seed → bootstrap → iterate → scale.
What's Next
- Scale the Sawalni formula to a much larger model → frontier-grade performance.
- Graduate from technology demonstrator to daily driver for millions of Moroccans.
- Eliminate “translationese” — currently limited by single-annotator scale.
- Voice support exploration.
- Amazigh language quality improvement — more native annotators needed.