By: Krista Davidson
19 May, 2026
The role of AI in democracy has become something of a Catch-22. While AI has the potential to enhance democracy through interactive elections, civic participation, and streamlined government services, it also poses significant risks, including the perpetuation of bias and misinformation. Perhaps one of the most serious risks is the immense power it provides to a few elite, high-resource companies.
This is the impetus for Colin Raffel‘s research, a Canada CIFAR AI Chair and associate director at the Vector Institute. He is also an associate professor of Computer Science at the University of Toronto and a faculty researcher at Hugging Face, an online platform that enables developers to collaborate on models, datasets and applications.
Raffel’s research is focused on mitigating the risks of AI, decentralizing it and making it easier for others to develop large-scale AI without having to wrangle endless amounts of unnecessary and often unauthorized data for training purposes.
“I was motivated by a growing concern around concentration of power among resource-rich companies. This motivated our focus on decentralized methods for developing large-scale AI. More recently, as AI has gotten more capable, my worries have broadened, and my group has increased its focus on mitigating risks,” says Raffel.
Raffel’s work is particularly poignant today amid concerns of a looming data shortage and a growing number of lawsuits regarding the use of unlicensed, copyrighted data to train Large Language Models (LLMs), advanced AI systems that can understand, process and generate complex tasks much like humans can.
While many companies claim that using unauthorized data is the only way to enhance the quality of leading AI models and continue progress in the rapidly growing field. Raffel and his collaborators recently invalidated this claim through a project called The Common Pile.
Using a large-scale dataset of openly licensed and public domain text, the team trained a series of models that performed comparably to those trained on unlicensed data. The Common Pile tested content from 30 sources of diverse domains, including research papers, code, books, audio transcripts and more. The research was regarded as a first step towards training models ethically and responsibly.
What’s more, the research makes it easier for others to advance and refine AI systems.
“We developed methods that made it possible for individual contributors to share model updates with one another, and combined them continuously to improve models in a decentralized fashion,” Raffel explains.
Despite these technical successes, Raffel remains wary of society’s growing overreliance on these tools. He states that delegating more labor and cognitive tasks to LLMs could lead to existential issues if extrapolated, noting that the technical problems underlying both existential and societal risks are often identical.
From music to machine learning
Raffel’s foray into the field was sparked by his love of music. “I got into research because I was a musician and wanted to develop new software tools for musicians,” he recalls. This passion led him to music information retrieval, an interdisciplinary field that extracts complex information from music — the same technology that powers song-matching apps and recommender algorithms.
The work led to an interest in machine learning, particularly algorithms with limited labeled data.
Raffel credits the Canada CIFAR AI Chairs program, the Vector Institute and the University of Toronto for cultivating a unique, collaborative ecosystem.
The Canada CIFAR AI Chairs program, a cornerstone of the Pan-Canadian AI Strategy, provides the long-term stability and funding necessary to allow elite AI researchers to focus on high-impact research. To date, the program has attracted more than 140 researchers to Canada to pursue their research.
“There is immense value in being outside the ‘gravitational pull’ of Silicon Valley,” he adds. “It allows us to think more critically and independently about the future of AI.”