By: Johnny Kung
3 Sep, 2020
Everyday, telescopes and other astronomical instruments produce a deluge of data about our universe. Astronomers and cosmologists are making use of powerful computer algorithms, including machine learning methods, to analyze this noisy data and pinpoint specific phenomena of interest. These tools can, and indeed have been, applied to tackle similarly complex data in biomedicine, from genomics to medical imaging.
On July 9th, 2020, CIFAR convened a virtual roundtable that brought together Fellows in CIFAR’s Gravity & the Extreme Universe program with CIFAR AI chairs and other international experts in academia and industry to discuss the advances in algorithms used for astronomical research and how they can be adapted for biomedicine, and vice versa. Through short presentations and facilitated discussion, this meeting explored the use of artificial intelligence (AI) for common challenges across the two fields, including the identification and classification of phenomena of interest and the treatment of noisy data. This roundtable served as a starting point for sustained cross-sectoral conversation and collaboration that may spur technological innovation as well as contribute to fundamental research.
Impacted Stakeholders
Observational astronomers and cosmologists
Academic, clinical and industry specialists in microscopy, radiology and medical imaging
Computer scientists in machine learning and computer vision
Developers of hardware (optics, electronics) and software for telescopes, microscopes and other imaging instruments
Key Insights
Astronomers aim to use machine learning (ML) for three aspects of data/image analysis: detection (finding things that haven’t been seen before), classification (separating astronomical objects into different categories), and inference (building astrophysical models and making predictions). Similarly, biomedical scientists are using ML tools for multiple tasks in image analysis: detection, classification, segmentation (correctly identifying the morphology and size of objects of interest), registration (combining images from different modalities, such as CT and MRI, for the same patient or sample, or from the same modality for different patients/samples), and generation (of high resolution images from low resolution data).
Compared to ML methods for classifying common everyday objects (such as cars, cats, or human faces), image analysis in astronomy and biology face a number of similar challenges: the objects or regions of interest are usually a very small part of the images; there are often much smaller differences between two different examples of an everyday object than between two images of the same astronomical or biological object made by different instruments or at different times; and there can be high variability in the judgement by experts themselves on the presence, identity and features of an astronomical or biological object.
Moreover, compared to the thousands or even millions of images of everyday objects available in databases such as ImageNet and Cityscapes Dataset, ML algorithms for astronomical and biomedical imaging have much smaller databases on which to train, and generating such datasets through expert annotation is time consuming. A subset of the actual observation data often needs to be used as the training dataset, otherwise researchers must generate simulated or “fake” data for training. To account for the potential effects of inter-instrument variability on the algorithms, ML models trained on data obtained from a particular instrument may also need to be retrained on a subset of data from another instrument and be tested. Given some of the similarities between images in astronomy and microscopy, it may be feasible to use datasets in one field, whether simulated or actual, to pre-train algorithms in the other.
In the sky, millions of astronomical events are occurring all over the electromagnetic spectrum (from X-ray, to visible light, to radio waves) at any given moment, and a new generation of large telescopes, such as the LSST/Vera Rubin Observatory in Chile, will be surveying the whole sky with a level of resolution comparable to the Hubble space telescope. This means that astronomers must decide to throw out a certain proportion of the observations so that a reasonable amount of data will be stored and followed up. Deep-learning based classifiers have been used to filter the data and weed out likely “bogus” signals.
A key task in astronomical observation is to identify “transient” objects or phenomena of interest that occur for only a brief period of time, such as an exploding star or a collision between two bodies. Traditionally, transients are identified by obtaining a “difference” image between observations from different nights, which are often made under significantly different conditions that result in images with different properties; an evaluation will then need to be made about whether the observed difference actually represents an astronomical transient. Some astronomers have autoencoders based on convolutional neural networks (CNN) to speed up this process. A similar problem arises for biologists using fluorescence microscopy to analyze neuronal activity in real time — the short bursts of signal that represent neuronal events of interest occur randomly in space and time, and constitute very small parts of the entire field of view (as little as 0.01%). Biologists are also developing CNN-based tools to detect and label such events in images or videos of neuronal activity.
Another task that shares similarities between the two fields is in segmenting and classifying objects of interest in images, whether on moons or planets, or different types of , cells or cellular structures. These objects share the features that those in the same class can have large differences in shapes and sizes, as well as boundaries that are hard to define or are fuzzy due to image resolution or light diffraction. Astronomers and biologists have developed a variety of ML tools to speed up and automate these tasks, identify missed objects, and increase the reliability and precision of classification.
Priorities and Next Steps
While current techniques do well with “clean” data with strong signals, much astronomical observation has low signal-to-noise ratios as well as complicated noise models (the probability distribution of noise in the data). Microscopy images often face a similar challenge with background noises such as autofluorescence (where certain cellular structures or molecules naturally emit light, obscuring the signal from the chemically-labelled molecules of interest). It remains a challenge to develop algorithms that work well with this kind of noisy data. One possible approach is to train algorithms with simulated “dirty” data, but doing so requires a good understanding of the noise model so that the generated noise is comparable to that in real data.
Computer scientists working in fields such as AI/ML and computer vision have much to contribute to developing algorithms for analyzing complex images in astronomy and biomedicine, but will have to be more comfortable in dealing with issues of noise and uncertainty. They also face a steep learning curve with the language and data types in these fields. Better documentation of their data and tools by astronomers and biologists can help, but there may often be a lack of training or incentive for these researchers — fulfilling these tasks takes time and is unlikely to result in publications. A variety of initiatives such as competitions, open datasets, and shared journal clubs or conferences may help increase mutual understanding, promote a culture of openness, and drive collaborations.
Moving forward, the development of a shared “dictionary” that maps the terms used in the two fields (for something as seemingly straightforward as “noise”) would help facilitate more fruitful crosstalk between the fields. A working group may also be set up to connect the two fields in a joint initiative for high throughput quantitative image analysis, especially for time lapse imaging in microscopy and astronomy.
Roundtable Participants
Timothée Bernard, Machine Learning Developer, Imagia
Katie Bouman, Assistant Professor, Caltech
Cole Clifford, Technical Product Manager, Dessa
Joseph Paul Cohen, Postdoctoral Fellow, Mila / Université de Montréal
Salehe Erfanian Ebadi, Postdoctoral Fellow, University of Alberta / MEDO.ai
Christian Gagné, Professor, Université Laval / Canada CIFAR AI Chair, Mila
Daryl Haggard, Associate Professor and Canada Research Chair in Multi-messenger Astrophysics, McGill University / Azrieli Global Scholar, Gravity & the Extreme Universe program, CIFAR
Renée Hložek, Assistant Professor, University of Toronto / Azrieli Global Scholar, Gravity & the Extreme Universe program, CIFAR
Elizabeth Huynh, Medical Physicist, Brigham and Women’s Hospital / Harvard Medical School
Flavie Lavoie-Cardinal, Assistant Professor, Université Laval
Pippin Lee, Software Engineer, Dessa
Adrian Liu, Assistant Professor, McGill University / Azrieli Global Scholar, Gravity & the Extreme Universe program, CIFAR
John Ruan, Assistant Professor, Bishop’s University
David Ruhe, Graduate Student, University of Amsterdam
Kendrick Smith, Faculty and Research Chair, Perimeter Institute for Theoretical Physics / Fellow, Gravity & the Extreme Universe program, CIFAR
Nicholas Vieira, Graduate Student, McGill University
Christopher Williams, Medical Physicist, Brigham and Women’s Hospital / Harvard Medical School
Further Reading
CIFAR resources:
AI for Astronomy and Health (research brief)
Untangling the cosmos (symposium brief)
A repeating fast radio burst (research brief)
AI research enables astronomy breakthrough (CIFAR news)
Other resources:
, by Spyridon Bakas et al.
, by Tianshi Cao et al.
Not-so-supervised: A survey of semi-supervised, multi-instance, and transfer learning in medical image analysis, by Veronika Cheplygina et al.
A machine learning approach for online automated optimization of super-resolution optical microscopy, by Audrey Durand et al.
U-Net: deep learning for cell counting, detection, and morphometry, by Thorsten Falk et al.
Deep learning from 21-cm tomography of the cosmic dawn and reionization, by Nicolas Gillet et al.
Classifying and segmenting microscopy images with deep multiple instance learning, by Oren Kraus et al.
Machine-learning-based brokers for real-time classification of the LSST alert stream, by Gautham Narayan et al.
Statistical classification techniques for photometric supernova typing, by James Newling et al.
Effective image differencing with convolutional neural networks for real-time transient hunting, by Nima Sedaghat et al.
Lunar crater identification via deep learning, by Ari Silburt et al.
A deep CFHT optical search for a counterpart to the possible neutron star–black hole merger GW190814, by Nicholas Vieira et al.
For more information, contact
Fiona Cunningham
Director, Innovation
CIFAR
fiona.cunningham@cifar.ca