Skip to content
CIFAR header logo
fr
menu_mobile_logo_alt
  • Our Impact
    • Why CIFAR?
    • Impact Clusters
    • News
    • CIFAR Strategy
    • Nurturing a Resilient Earth
    • AI Impact
    • Donor Impact
    • CIFAR 40
  • Events
    • Public Events
    • Invitation-only Meetings
  • Programs
    • Research Programs
    • Pan-Canadian AI Strategy
    • Next Generation Initiatives
  • People
    • Fellows & Advisors
    • CIFAR Azrieli Global Scholars
    • Canada CIFAR AI Chairs
    • AI Strategy Leadership
    • Solution Network Members
    • Leadership
    • Staff Directory
  • Support Us
  • About
    • Our Story
    • Awards
    • Partnerships
    • Publications & Reports
    • Careers
    • Equity, Diversity & Inclusion
    • Statement on Institutional Neutrality
    • Research Security
  • fr
CIFAR Pan-Canadian AI Strategy

Outsmarting humans at their own game

By: Krista Davidson
9 Dec, 2019
December 9, 2019
Csaba Szepesvári I

In a game that has more possible configurations than the number of atoms in the universe, how do you predict the winning move?

Csaba Szepesvári is one of the brains behind an algorithm that helped a computer program accomplish the mathematically difficult task of outwitting professional human player, Lee Sedol, in the ancient game of Go.

He joins the ranks of other prestigious researchers as a Canada CIFAR AI Chair, a cornerstone program under the Pan-Canadian AI Strategy. The program provides dedicated research funding for Canada’s leading AI researchers. It means Szepesvári will continue his groundbreaking research in the area of reinforcement learning as a fellow at the Alberta Machine Intelligence Institute (Amii), a professor with the department of computing science at the University of Alberta, and a senior staff research scientist at DeepMind.

Using AI to outsmart humans at GO

Szepesvári’s research has had an influential impact on the development of the popular AI techniques – Monte Carlo tree search and bandit algorithms. His work has been successful in helping computers outsmart professional human players in the game of Go, a board game similar to chess, which requires a long sequence of strategic steps between opponents.

The term Monte Carlo tree search, which was coined by one of his colleagues, Rémi Coulom in 2006, is an algorithm which uses a randomized traversal of possible ways of continuing a game to predict the winning moves.

“I’ve always been interested in intelligence and creating intelligent agents. I thought the framework provided by reinforcement learning was a perfect fit for modeling intelligence.”

In the same year, Szepesvári along with Levente Kocsis, developed an algorithm where they refined the initial algorithm by adjusting the value predictions using upper confidence bounds (also known as UCT) to remove the inconsistency of the initial version of the algorithm. A variation of this modification was critical to the success of Google DeepMind’s AlphaGo and AlphaZero computer programs in defeating a human professional player, becoming the first computer program to accomplish that feat in October 2015.

“DeepMind’s achievement took the world, and even the experts in the field, by surprise. It is a wonderful demonstration of the power of reinforcement learning algorithms combined with search. I am very happy to have witnessed this milestone event,” says Szepesvári .

The Monte Carlo tree search algorithm uses randomness for deterministic problems that are difficult or impossible to solve using other approaches. It relies on the exploration-exploitation approach: the exploration of possible moves or steps, and the exploitation of the path with the greatest reward. The purest example where the exploration-exploitation dilemma arises is known as bandit problems, an area that has come of great interest and expertise of Szepesvári’s.

His contributions to bandit-based Monte Carlo planning have led to him and Levente Kocsis receiving the Test of Time Award at the ECML/PKDD 2016 international conference, the leading European machine learning and data mining conferences.

World-class talent remains in Canada

Szepesvári was introduced to reinforcement learning as a PhD student. “I’ve always been interested in intelligence and creating intelligent agents. I thought the framework provided by reinforcement learning was a perfect fit for modeling intelligence,” says Szepesvári.

Szepesvári, who comes from Hungary and completed his PhD at the Attila József University (Hungary), has been with the University of Alberta since 2006.

“It’s not an overstatement to say that Canada is a leader in reinforcement learning. Machine learning can contribute to many positive changes in the world, but we have a better chance of doing so if we address the challenges that arise in reinforcement learning.”

He is widely considered an established expert on the convergence of reinforcement learning algorithms, Monte Carlo tree search, and exploration in bandit problems. His contributions to the field led him to joining DeepMind in 2017, where he leads the Foundations team. Having authored more than 140 conference publications, 40 journal publications and three books, he has contributed significantly to the field of reinforcement learning.

Szepesvári is the author of three books: Performance of Nonlinear Approximate Adaptive Control (Published by Wiley, 2003) addresses theoretical guarantees on the performance of adaptive control designs. Algorithms for Reinforcement Learning (published by Morgan & Claypool Publishers in 2010) addresses the fundamental theoretical and algorithmic issues of reinforcement learning, and is considered required reading for researchers new to the field.

A third book, Bandit Algorithms, is in the works for early 2020. Co-authored by Tor Lattimore, it will be published by Cambridge University Press.

His Canada CIFAR AI Chair award means he will continue to conduct groundbreaking research in Canada.

“It’s not an overstatement to say that Canada is a leader in reinforcement learning. Machine learning can contribute to many positive changes in the world, but we have a better chance of doing so if we address the challenges that arise in reinforcement learning. It’s all about learning from feedback through interaction with an environment,” says Szepesvári.

“There are many exciting new developments in the field of reinforcement learning. It’s a good time to lead the way with new advances.”


The Canada CIFAR AI Chairs Program is the cornerstone program of the CIFAR Pan-Canadian AI Strategy. A total of $86.5 million over five years has been earmarked for this program to attract and retain world-leading AI researchers in Canada. The Canada CIFAR AI Chairs that have been announced to-date are conducting research in a range of fields, from machine learning for health, autonomous vehicles, artificial neural networks, climate change and more.

  • Follow Us

Related Articles

  • Strengthening Canada’s AI talent ecosystem
    April 16, 2025
  • Three 2024 Nobel Laureates among CIFAR’s acclaimed community of researchers
    October 15, 2024
  • Canada CIFAR AI Chairs gather in Banff for annual AICan meeting
    June 20, 2024
  • Indigenous perspectives in AI
    June 18, 2024

Support Us

The Canadian Institute for Advanced Research (CIFAR) is a globally influential research organization proudly based in Canada. We mobilize the world’s most brilliant people across disciplines and at all career stages to advance transformative knowledge and solve humanity’s biggest problems, together. We are supported by the governments of Canada, Alberta and Québec, as well as Canadian and international foundations, individuals, corporations and partner organizations.

Donate Now
CIFAR footer logo

MaRS Centre, West Tower
661 University Ave., Suite 505
Toronto, ON M5G 1M1 Canada

Contact Us
Media
Careers
Accessibility Policies
Supporters
Financial Reports
Subscribe

  • © Copyright 2025 CIFAR. All Rights Reserved.
  • Charitable Registration Number: 11921 9251 RR0001
  • Terms of Use
  • Privacy
  • Sitemap

Subscribe

Stay up to date on news & ideas from CIFAR.

Fields marked with an * are required

Je préfère m’inscrire en français (cliquez ici).


Subscribe to our CIFAR newsletters: *

You can unsubscribe from these communications at any time. View our privacy policy.


As a subscriber you will also receive a digital copy of REACH, our annual magazine which highlights our researchers and their breakthroughs with long-form features, interviews and illustrations.


Please provide additional information if you would like to receive a print edition of REACH.


This website stores cookies on your computer. These cookies are used to collect information about how you interact with our website and allow us to remember you. We use this information in order to improve and customize your browsing experience and for analytics and metrics about our visitors both on this website and other media. To find out more about the cookies we use, see our Privacy Policy.
Accept Learn more

Notifications