David Krueger
About
Appointed Canada CIFAR AI Chair – 2025
David Krueger is a Canada CIFAR AI Chair at Mila and an assistant professor at the University of Montreal. He works on reducing the risk of human extinction from artificial intelligence (AI x-risk) through technical research as well as education, outreach, governance and advocacy. His research focuses on understanding and addressing the risks of advanced AI systems, especially AI agents and LLMs. His past research spans many areas of deep learning, AI alignment, AI safety, AI ethics and AI governance, including alignment failure modes, algorithmic manipulation, interpretability, robustness and understanding how AI systems learn and generalize. Krueger was previously an assistant professor at the University of Cambridge, and a Research Director at the UK AI Security Institute.
Relevant Publications
- Krasheninnikov, D., Krasheninnikov, E., Mlodozeniec, B., Maharaj, T., & Krueger, D. (2024). Implicit meta-learning may lead language models to trust more reliable sources. International Conference on Machine Learning.
- Chan, A., Salganik, R., He, Z., Burden, J., Duan, Y., Rismani, S., Markelius, A., Collins, K., Molamohammadi, M., Pang, C., Langosco, L., Voudouris, K., Zhao, W., Krasheninnikov, D., Lin, M., Mayhew, A., Bhatt, U., Weller, A., Krueger, D., & Maharaj, T. (2023). Harms from increasingly agentic algorithmic systems. ACM Conference on Fairness, Accountability, and Transparency.
- Joar Skalse, Niki Howe, Dmitrii Krasheninnikov, David Krueger (2022). Defining and Characterizing Reward Gaming. Neural Information Processing Systems.
- Langosco, L. D., Koch, J., Sharkey, L., Pfau, J., & Krueger, D. (2022). Goal misgeneralization in deep reinforcement learning. International Conference on Machine Learning.
- Krueger, D., Caballero, E., Jacobsen, J.-H., Zhang, A., Binas, J., Zhang, D., Le Priol, R., & Courville, A. (2021). Out-of-distribution generalization via risk extrapolation (REx). International Conference on Machine Learning.