Canada CIFAR AI Chair Aishwarya Agrawal is pioneering Visual Question Answering (VQA) which will revolutionize how machines understand the content of images.
Photo courtesy of Aishwarya Agrawal
Aishwarya Agrawal is a leader in the development of VQA, a complex and challenging task that enables machines to have an understanding of vision, language, knowledge and common sense based reasoning. VQA can assist artificial intelligence (AI) in visual perception and natural language communication.
VQA enables machines to answer complex questions in a way that’s accessible to humans, such as “What is the man in blue shirt holding?” To be able to answer the question, machines need to identify the region of the image where there is a person in blue shirt (called language grounding). Secondly, they need to understand the meaning of the word “holding”, i.e., they need to look at the person’s hands, even though “hands” aren’t mentioned in the question. Finally, they need to identify the name of the object in the person’s hands.
Agrawal is excited about VQA because it has potential applications that could improve the quality of life for the visually impaired, support children’s educational development and enhance the user experience with virtual assistants such as Siri and Alexa.
“There is something to solving the question of intelligence itself and what this technology could mean for our future,” she says.
Building the world’s first and largest open-ended dataset for VQA
VQA first trickled onto the AI scene in 2014 and has gained significant interest, in large part due to the work of Agrawal and her colleagues following the publication of a paper at the International Conference on Computer Vision (ICCV) the following year.
The team, which included researchers from Virginia Tech and Microsoft Research collected and publicly released the first and largest free-form and open-ended VQA dataset. They also started an annual VQA challenge to push the machine performance on VQA.
“There is something to solving the question of intelligence itself and what this technology could mean for our future”
Each year the competition presents a set of images and natural language questions, such as “What kind of cheese is on the pizza?”, “Does this person have 20/20 vision?”, inviting researchers and students from around the world to provide natural language answers. To date, the dataset contains about 250,000 images, 760,000 questions and 10 million answers. In the span of four years, Agrawal and her team have received over 1300 citations, over 800 downloads of the dataset and a best poster award at the Workshop on Object Understanding for Interaction at ICCV 15.
Canada has one of the ‘best AI environments in the world’
Agrawal joins Mila and the Université de Montréal’s department of computer science and operations research as an assistant professor in 2020. She completed her PhD at Georgia Institute of Technology. Agrawal says she chose Canada to pursue her research for its vibrant and collaborative research environment.
“Right now, climate change poses a big problem, and for me particularly, coming from a small town in India where we face many problems with primary education, healthcare solving the AI challenge could help us address the problems of climate change, education and healthcare.”
“Right now, climate change poses a big problem, and for me particularly, coming from a small town in India where we face many problems with primary education, healthcare solving the AI challenge could help us address the problems of climate change, education and healthcare,” she says.
“I believe the environment around you can play an important role in shaping the kind of research you do. And to me, Montréal, and Canada in general, seem to have one of the best AI environments in the world. There are brilliant AI researchers here who are pursuing important research directions, and there is a lot of support from institutions and government for long-term research. And less common, is that Canada has a very healthy ecosystem in which industries and universities collaborate.”
Agrawal will use her time as a Canada CIFAR AI Chair to further develop VQA. “Training models based on large data sets can potentially result in biased AI systems. For example, if a system recognizes that most of the people in images are holding briefcases, it may make the assumption that all questions that ask what a person is holding should be answered with ‘briefcases’,” she says.
“It’s very challenging to train models to overcome dataset biases and to answer purely based on the evidence presented by the image,” says Agrawal, but one that she’s determined to address during her term as a Canada CIFAR AI Chair.
The Canada CIFAR AI Chairs Program is the cornerstone program of the CIFAR Pan-Canadian AI Strategy. A total of $86.5 million over five years has been earmarked for this program to attract and retain world-leading AI researchers in Canada. The Canada CIFAR AI Chairs that have been announced to-date are conducting research in a range of fields, from machine learning for health, autonomous vehicles, artificial neural networks, climate change and more.