3D Structure Prediction in the era of AI
The different tools and platforms available out there, and the best positioned companies in the era of AI
The advent of Artificial Intelligence and Machine Learning, and its use in tools such as Deepmind Alphafold, has revolutionized the field of protein structure prediction and made it more accurate and efficient. This breakthrough has also led to a surge in interest in using tools and software to visualize 3D structures, including proteins and chemicals, in order to make them more accessible and mainstream.
3D structure Prediction
Deepmind's Alphafold version 2 is considered the most widely used software for predicting 3D structures of proteins and protein complexes due to its high accuracy and speed. Alphafold is based on the JAX library and Multiple Sequence Alignment (MSAs) modeling, which allows it to efficiently predict protein structures by integrating evolutionary and biophysical principles. Alphafold is trained on a large dataset of known protein structures and protein sequences, and uses a deep neural network to predict the 3D structure of a given protein sequence.
Since the release of Alphafold, other tools have emerged that are better at some aspects of 3D structure prediction than Alphafold. For example, Uni-Fold, based on PyTorch and MSAs, can be better for predicting protein complexes than Alphafold. Uni-Fold uses a hierarchical neural network architecture that incorporates both local and global information to predict the structure of protein complexes.
OpenFold is another tool that that is openly available to researchers (Open Source variant of AlphaFold2), and it has the potential to provide valuable insights into the mechanisms of protein folding. By training models on this data, researchers can learn more about the physics and chemistry of protein folding, which could lead to new discoveries in the field of protein science.
One of the most exciting aspects of OpenFold is its ability to generalize beyond protein structure prediction. This means that the algorithms trained on OpenFold data could potentially be used to predict the structures of other biomolecules, such as RNA and DNA. Additionally, these models could be used to predict the behavior of biological systems beyond just the 3D structure of individual molecules, such as protein-protein interactions or even whole cellular processes.
Furthermore, OpenFold's ability to generalize could also enable the development of systems similar to Alphafold for other areas of science where there is sparse data, such as drug discovery. By leveraging the vast amount of available data in these fields and training machine learning models to make predictions, one could gain valuable insights into complex systems that were previously difficult to study.
ColabFold is a platform that has helped compile and refine all the Google Colab notebooks available for 3D structure prediction. It is a community-driven project that provides a set of notebooks to predict the 3D structure of proteins using Alphafold and other state-of-the-art methods. ColabFold has been instrumental in democratizing the access to powerful computational resources, making it possible for researchers worldwide to perform protein structure prediction using the latest methods.
Facebook (now Meta) has also invested in this field, and ESMFold is one of the fastest methods of predicting monomeric proteins, with an API service that can take around 10 seconds to predict a 500 amino-acid monomer.
One of the novel methods in ColabFold is protein hallucination, which is based on generative AI methods. This approach involves generating new protein structures that do not exist in nature, which can be used to explore the conformational space of proteins and predict novel structures. Protein hallucination has the potential to revolutionize drug discovery by enabling the design of new proteins with specific functions.
Some of the notebooks available in ColabFold include:
AlphaFold2: Protein Structure Prediction: This notebook uses Alphafold2 to predict the 3D structure of a protein from its amino acid sequence.
ESMFold
RoseTTAFold for Protein Structure Prediction: This notebook uses the Rosetta software suite to predict the 3D structure of a protein from its amino acid sequence.
OmegaFold
These notebooks and others available in ColabFold provide a powerful platform for researchers to perform protein structure prediction and explore the conformational space of proteins using the latest methods.
FastFold, HelixFold, MEGA-Fold, RosettaFold, AFsample, EMBER3D, IgFold, OmegaFold, EquiFold, and RGN2 are other tools that have emerged in recent years and have shown promising results in protein structure prediction.
These tools use different approaches, including deep learning, physics-based modeling, and evolutionary information, to predict the structure of proteins and protein complexes. While some tools may be better suited for certain types of proteins or applications, Alphafold remains the most widely used and accurate tool for predicting the 3D structure of proteins and protein complexes.
Visualization tools
One such platform is Benchling, which is an Electronic Lab Notebook (ELN) cloud system and Laboratory Information Management System (LIMS), which provides a user-friendly interface based on MolStar for viewing and manipulating 3D structures. Benchling, as so do other tools these days, incorporates Alphafold predicting capabilities. One simply defined the protein sequence, clicks on the ‘Predict with Alphafold’ button, and a few minutes later, the pdb file is available in the system. Via the MolStar plugin, it can display molecular structures in different representations, including stick, ball and stick, and space-filling models. Benchling also allows users to highlight specific amino acids or atoms in the molecule and view them in detail.
Dotmatics Geneious is another popular tool used for visualizing 3D structures. It offers a range of viewing modes, including cartoon, ribbon, and surface representations, and allows users to rotate and zoom in on structures. It also provides a range of analytical tools, such as measuring distances and angles between atoms.
CDD Vault is a cloud-based software that allows users to visualize and annotate molecular structures. It offers a range of viewing modes, including cartoon, space-filling, and surface representations, and allows users to customize the colors and styles of the molecule. CDD Vault also provides tools for measuring distances and angles, as well as creating custom labels and annotations.
OpenEye Scientific is another powerful software suite for visualizing and analyzing molecular structures. It includes a range of visualization tools, including 2D and 3D representations, and allows users to customize the style and color of the molecule. OpenEye Scientific also provides a range of analytical tools, including molecular docking and virtual screening.
PyMol, developed as an open-source package but distributed by Schrodinger Inc., is a widely used software for visualizing and analyzing molecular structures. It provides a range of viewing modes, including cartoon, surface, and stick representations, and allows users to customize the color and style of the molecule. PyMol also provides tools for measuring distances and angles, as well as creating animations and videos of molecular structures. A free version of PyMol is also available for academic use, although it’s more intricate to install on a computer than the licenced version distributed by Schrodinger Inc.
UCSF Chimera is another popular software for visualizing and analyzing molecular structures. It provides a range of viewing modes, including cartoon, stick, and surface representations, and allows users to customize the color and style of the molecule. UCSF Chimera also provides a range of analytical tools, including molecular docking and virtual screening. ChimeraX has recently incorporated the ability to call Alphafold on a Google Colab Notebook, and bring back the pdb file resulting from the prediction into the ChimeraX software for visualization and manipulation.
MolStar is a web-based tool for visualizing molecular structures in 3D. It allows users to upload PDB files and view them in different representations, including cartoon, stick, and surface models. MolStar also provides a range of analytical tools, including measuring distances and angles between atoms. Many platforms incorporate MolStar as their visualization software for biological 3D structures.
In summary, there are a variety of tools available for visualizing 3D structures, including proteins and chemicals, ranging from user-friendly interfaces to more advanced analytical suites. With the increased accuracy of protein structure prediction provided by AI and Deepmind Alphafold, these visualization tools have become more mainstream and accessible to a wider audience.
Which are the best positioned companies in 3D structure prediction using AI?
Recent advances in artificial intelligence (AI) methodologies, particularly in the fields of deep learning and machine learning, have allowed for the development of new tools and technologies for drug discovery, biomarker identification, and other areas of biotechnology. These advances have enabled the processing of vast amounts of data and the development of new insights and models that were previously impossible to achieve with traditional methods.
As a result, a new wave of TechBio companies has emerged, with a focus on leveraging AI and machine learning to drive innovation in biotechnology and healthcare. Companies such as Exscientia, Insitro, Recursion Pharmaceuticals, and Atomwise have raised hundreds of millions of dollars in venture capital funding to develop new drug candidates, identify biomarkers, and accelerate drug discovery and development.
These companies use a range of AI methodologies and tools to process and analyze large data sets, such as high-throughput screening data, genetic data, and clinical data, in order to identify new targets, design new drugs, and optimize drug development processes. The use of cloud computing and high-performance computing has also enabled these companies to scale their operations and perform complex computations quickly and efficiently.
Behind the curtain, we will describe each of these companies, such as $EXAI Exscientia, Insitro, $RXRX Recursion, Atomwise, Numerate, TandemAI, etc., their pros and cons, best-selling points and future work…