By: Justine Brooks
1 Oct, 2025
For many people, particularly older populations, taking multiple medications at the same time is a necessary part of everyday life. This practice, known as polypharmacy, poses significant danger to patients, as it could result in adverse interactions between drugs.
Audrey Durand, a Canada CIFAR AI Chair at Mila and associate professor at Université Laval, is applying reinforcement learning techniques to healthcare data to uncover these dangerous interactions and develop a system that could automatically alert pharmacists to potential risks.
CIFAR spoke with Durand about her research, how it could improve patient safety and the impact of AI on healthcare.
Audrey Durand: Large quantities of data have been collected over the years (and are still being collected) by the public healthcare system, logging all medications prescribed to people over time. This motivates the development of models capable of predicting the risk of drug combinations to help avoid PIP. However, computing the risk for all drug combinations that appear simultaneously is extremely costly in terms of time and compute. Moreover, since most combinations have low risk, most combinations would not be useful for training such a predictive model. Our method is able to select a subset of drug combinations for efficiently building a predictive model.
We leveraged neural bandit (a specific reinforcement learning problem) strategies for sequentially deciding which drug combinations to evaluate, i.e. compute the risk based on the pre-collected data, and update the predictive model. This allowed 1) to identify PIP currently contained in the data, and 2) to build a machine learning model capable of predicting the risk of any drug combinations (even those not contained in the dataset).
AD: Simulations are often required for validating our methods, i.e. convince ourselves and others that the approach works – we need to conduct experiments in a controlled setting where the “truth” is known, so that we can see to what extent the solution can be recovered correctly. Simulations are also useful for characterizing the proposed methods, for example by showing how it scales (sample efficiency, compute) with the problem complexity. This helps to evaluate its potential for real-world deployment.
AD: A good predictive model of PIP could be used as a decision-support tool for pharmacists by flagging potentially dangerous drug combinations. This could help prevent the adverse effects resulting from unknown/undetermined interactions between different drugs.
AD: By developing our methods using custom-made simulators, we are not exposed to sensitive data during the main course of the project, only at deployment. Our goal is to develop general methods that can be applied to a variety of problems that share the same challenges. Using a simulator allows us to develop such general methods instead of the model memorizing the specific use case (application). However, this does prevent us from developing methods specifically tailored for the use case; that is our tradeoff
AD: The support from the Canada CIFAR AI Chair award offers a lot of freedom, enabling researchers to undertake calculated risks often discouraged by conventional, short-term funding. Such flexibility is especially crucial when pursuing multidisciplinary collaborations. Unfortunately, many of these interdisciplinary avenues find themselves sidelined by traditional funding mechanisms, but CIFAR funding enables and supports this kind of exploratory interdisciplinary research.
AD: Recent progress in large language models have opened the door to the fast development of sequential decision-making, fully-automated agents, including agents that learn by interacting with their environment. Yet, there are still many open questions regarding how the interactive learning strategies of these agents may influence one another, and how they can influence the collective optimal behaviours that can be learned. We may need to rethink how agents conduct exploration under such conditions in order to prevent the emergence of unwanted behaviours, such as bias and collusion.
In terms of health-related applications, I am getting quite excited about molecule discovery (e.g., drug discovery), where I think that methods with smart exploration capabilities have the potential to have high impact.
Overall, I believe that understanding how to efficiently (and safely) explore complex environments is still a missing (or at least incomplete) piece of the puzzle that we should consider to enable impactful, yet safe, AI applications.