South Africa with its rich diversity of eleven official languages, is seen as a potential emerging market where language technology (LT) applications can contribute to the promotion of multilingualism and language development, and as such have a positive impact on the South African community. One of the fundamental resources required for the development of a large number of core language technologies (LTs) and LT applications, is a wordnet. A wordnet is a lexical database consisting of words that are grouped into sets of synonyms called synsets. Various conceptual-semantic and lexical relations are indicated between the synsets contained in a wordnet.
Wordnets are not only useful, but also indispensable components of large automatic language understanding systems being developed and tested in academia and industry. Adding several South African languages to the wordnet web enables many such applications for each of these languages in isolation. Moreover, linking the South African Wordnets to one another and to the many global Wordnets makes cross-linguistic information retrieval and question answering possible, and significantly aids machine translation, an important contribution to the empowerment of the African languages within the newly established National Centre for South African Digital Language Resources.
Wordnets for African languages were introduced with a training workshop for linguists, lexicographers and computer scientists by international experts in 2007. Since then, wordnets for five African languages, namely Setswana (tsn), isiXhosa (xho), isiZulu (zul), Sesotho sa Leboa (nso) and Tshivenda (ven) have grown to roughly 10 000 synsets each, while the other four official African languages, namely Sesotho (sot), Xitsonga (tso), isiNdebele (nde) and Siswati (ssw), each boast with 1 000 synsets.