A new data-processing approach created by scientists at the University of Michigan Life Sciences Institute offers a simpler, faster path to data generated by cryo-electron microscopy instruments, removing a barrier to wider adoption of this powerful technique.
Cryo-EM enables scientists to determine the 3-D shape of cellular proteins and other molecules that have been flash-frozen in a thin layer of ice. Advanced microscopes beam high-energy electrons through the ice while capturing thousands of videos. These videos are then averaged to create a 3-D structure of the molecule.
By uncovering the precise structures of these molecules, researchers can answer important questions about how the molecules function in cells and how they might contribute to human health and disease. For example, researchers recently used cryo-EM to reveal how a protein spike on the COVID-19 virus enables it to gain entry into host cells.
Recent advances in cryo-EM technology have rapidly opened this field to new users and increased the rate at which data can be collected. Despite these improvements, however, researchers still face a substantial hurdle in accessing the full potential of this technique: the complex data processing landscape required to turn the microscope's terabytes of data into a 3-D structure ready for analysis.
Before researchers can begin analyzing the 3-D structure they want to study, they have to complete a series of preprocessing steps and subjective decisions. Currently, these steps must be supervised by humans—and because researchers use cryo-EM to analyze a huge variety of molecule types, scientists thought that it was nearly impossible to create a general set of guidelines that all researchers could follow for these steps, said Yilai Li, a Willis Life Sciences Fellow at the LSI who led the development of the new program.
"If we can create an automated pipeline for those preprocessing steps, the whole process could be much more user-friendly, especially for newcomers to the field," Li said.
Using machine learning, Li and his colleagues in the lab of LSI assistant professor Michael Cianfrocco have developed just such a pipeline. The program was published April 14 as part of a study in the journal Structure.
The new program connects several deep-learning and image-analysis tools with preexisting software data preprocessing algorithms to narrow enormous data sets down to the information that researchers need to begin their analysis.
"This pipeline takes the knowledge that experienced users have gained and puts it into a program that improves accessibility for users from a range of backgrounds," said Cianfrocco, who is also an assistant professor of biological chemistry at the U-M Medical School. "It really streamlines the process stage so that researchers can jump in and focus on what's important: the scientific questions they want to ask and answer."
The study appears in the journal Structure.