Mitigating Dialect Bias
Can Large Language Models be developed and deployed in a socially responsible way that prevents harm and ensures digital equity for diverse linguistic communities?
The global proliferation of Large Language Models (LLMs) is a top priority for technology leaders and governments. However, their widespread integration into society presents a critical safety risk for millions. For the over 140 million speakers of Nigerian Pidgin English, these advanced AI systems can lead to censorship, discrimination, and digital exclusion.
Because of systems-level barriers, such as the vast underrepresentation of marginalized dialects in training data, the lack of culturally-aware evaluation tools, and undeveloped regulatory frameworks, it has been difficult to scale safe and inclusive AI systems from a single language to entire global populations. Communities whose dialects are not prioritized in AI development are less likely to benefit from technological advances and are more likely to be harmed by biased algorithms, leading to what can be termed a “cycle of digital exclusion.”
Dismantling barriers to safe and equitable AI requires concerted efforts by developers, researchers and policymakers. However, these efforts are hampered by the lack of analytic tools that can accurately identify and mitigate the population-wide risks of dialect bias, where a user's language variety is systematically misinterpreted as toxic, inappropriate or harmful.
This Solution Network will focus on the socially responsible, citizen-centered co-creation of dialect bias benchmarks, mitigation tools, and policy solutions, working with a diverse Canadian-African team and directly with Nigerian Pidgin-speaking communities.
Founded
2025
Supporters
International Development Research Centre, CIFAR
CIFAR Contact
Gagan Gill
Associate Director, AI Safety