De-risking and de-mystifying large-scale AI

By: Krista Davidson

19 May, 2026

May 19, 2026

Colin Raffel with illustrative representation of data nodes in the background / Colin Raffel avec une représentation illustrative des nœuds de données en arrière-plan

The role of AI in democracy has become something of a Catch-22. While AI has the potential to enhance democracy through interactive elections, civic participation, and streamlined government services, it also poses significant risks, including the perpetuation of bias and misinformation. Perhaps one of the most serious risks is the immense power it provides to a few elite, high-resource companies.

This is the impetus for Colin Raffel‘s research, a Canada CIFAR AI Chair and associate director at the Vector Institute. He is also an associate professor of Computer Science at the University of Toronto and a faculty researcher at Hugging Face, an online platform that enables developers to collaborate on models, datasets and applications.

Raffel’s research is focused on mitigating the risks of AI, decentralizing it and making it easier for others to develop large-scale AI without having to wrangle endless amounts of unnecessary and often unauthorized data for training purposes.

“I was motivated by a growing concern around concentration of power among resource-rich companies. This motivated our focus on decentralized methods for developing large-scale AI. More recently, as AI has gotten more capable, my worries have broadened, and my group has increased its focus on mitigating risks,” says Raffel.

Raffel’s work is particularly poignant today amid concerns of a looming data shortage and a growing number of lawsuits regarding the use of unlicensed, copyrighted data to train Large Language Models (LLMs), advanced AI systems that can understand, process and generate complex tasks much like humans can.

While many companies claim that using unauthorized data is the only way to enhance the quality of leading AI models and continue progress in the rapidly growing field. Raffel and his collaborators recently invalidated this claim through a project called The Common Pile.

Using a large-scale dataset of openly licensed and public domain text, the team trained a series of models that performed comparably to those trained on unlicensed data. The Common Pile tested content from 30 sources of diverse domains, including research papers, code, books, audio transcripts and more. The research was regarded as a first step towards training models ethically and responsibly.

What’s more, the research makes it easier for others to advance and refine AI systems.

“We developed methods that made it possible for individual contributors to share model updates with one another, and combined them continuously to improve models in a decentralized fashion,” Raffel explains.

Despite these technical successes, Raffel remains wary of society’s growing overreliance on these tools. He states that delegating more labor and cognitive tasks to LLMs could lead to existential issues if extrapolated, noting that the technical problems underlying both existential and societal risks are often identical.

From music to machine learning

Raffel’s foray into the field was sparked by his love of music. “I got into research because I was a musician and wanted to develop new software tools for musicians,” he recalls. This passion led him to music information retrieval, an interdisciplinary field that extracts complex information from music — the same technology that powers song-matching apps and recommender algorithms.

The work led to an interest in machine learning, particularly algorithms with limited labeled data.

Raffel credits the Canada CIFAR AI Chairs program, the Vector Institute and the University of Toronto for cultivating a unique, collaborative ecosystem.

The Canada CIFAR AI Chairs program, a cornerstone of the Pan-Canadian AI Strategy, provides the long-term stability and funding necessary to allow elite AI researchers to focus on high-impact research. To date, the program has attracted more than 140 researchers to Canada to pursue their research.

“There is immense value in being outside the ‘gravitational pull’ of Silicon Valley,” he adds. “It allows us to think more critically and independently about the future of AI.”

single share hidden input
Follow Us

Beyond the magic toolbox
July 03, 2026
CIFAR researchers Pablo Jarillo-Herrero and Allan MacDonald awarded 2026 Kavli Prize in Nanoscience
June 12, 2026
CIFAR researchers win prestigious sustainability award for study on climate-resilient farming
June 08, 2026
CIFAR report charts bold research directions for the future of the Arctic
June 02, 2026

Support Us

The Canadian Institute for Advanced Research (CIFAR) is a globally influential research organization proudly based in Canada. We mobilize the world’s most brilliant people across disciplines and at all career stages to advance transformative knowledge and solve humanity’s biggest problems, together. We are supported by the governments of Canada, Alberta and Québec, as well as Canadian and international foundations, individuals, corporations and partner organizations.

Donate Now

De-risking and de-mystifying large-scale AI

Follow Us

Related Articles

Support Us