Technical articles

The use of artificial intelligence in omics to predict patient survival

19/12/2024

In recent years, artificial intelligence (AI) has been a highly publicized field due to its transformative impact on society. AI influences various professions and extends to multiple fields of application, including bioinformatics. This article highlights the contributions of AI in a specific context: predicting patient survival using multi-omics data.

Introduction to artificial intelligence and deep learning

AI is based on the development of algorithms and statistical models that enable computers to perform tasks requiring human capabilities, such as problem-solving, learning, understanding natural language, recognizing patterns, and decision-making. By leveraging accumulated experiences, computers can predict new answers to questions of interest. AI has revolutionized many sectors, from finance to medicine, by enabling the analysis and interpretation of large amounts of data with unprecedented precision and efficiency.

Among the various AI approaches, deep learning stands out as particularly noteworthy. This machine learning method relies on artificial neural networks designed to mimic the brain’s information processing. These networks consist of layers of neurons, each processing a portion of the information before passing the result to the next layer. This structure allows the network to learn and recognize complex patterns in large datasets. For example, in image recognition, a neural network can learn to distinguish between images of dogs and cats after being exposed to thousands of labeled examples. Deep neural networks are characterized by architectures with multiple layers (hence the term “deep”). Figure 1 illustrates a network with two hidden layers.

Figure 1: Diagram of an MLP (Multi-Layer Perceptron) neural network.

AI and omics data: personalized medicine

In the context of omics data – which includes genomic, transcriptomic, proteomic, and metabolomic information – deep learning provides a powerful approach to uncover hidden processes. By analyzing patterns in these data, deep learning models can predict clinical outcomes, such as patient survival, offering valuable tools for biomedical research and disease treatment.

In healthcare, personalized medicine increasingly relies on AI to tailor treatments to individual patient characteristics and predict their responses to therapies. This approach enables doctors to propose personalized care pathways based on patients’ clinical and molecular features. Clinical characteristics include factors like smoking habits or family history, while molecular features, referred to as omics, encompass data derived from various technologies.

The different types of omics include:

Genomics: Sequencing nucleotides (e.g., SNP [Single Nucleotide Polymorphism] data).
Transcriptomics: Gene expression (e.g., RNA-seq or microarray data).
Proteomics: Protein abundance.
Metabolomics: Study of small molecules in biological systems.

The multi-omics approach combines these datasets to better understand biological systems, providing a holistic view of complex biological processes. This approach has gained significant traction and popularity in biomedical sciences and healthcare since the 2010s. It enables the identification of molecular interactions at various biological levels.

Transforming omics data

Integrating different omics data presents several challenges, requiring powerful computational tools. Traditional machine learning methods have been developed, such as:

Correlation analysis: Identifying correlations between different omics levels to understand how changes at one level impact another.
Pathway analysis: Identifying major molecular functions to understand interactions between molecular components and their roles in biological processes.
Network analysis: Constructing molecular interaction networks to identify key mechanisms and their impact on various biological functions.

The emergence of deep learning brings new opportunities for more efficient and accurate analyses of large multi-omics datasets. One challenge in integrating multi-omics data is the high number of variables. For instance, genomic and proteomic data can include nearly 50,000 variables, while patient cohorts are often limited, making participant recruitment difficult. This creates a high-dimensional problem, where the number of variables far exceeds the sample size. The goal is to reduce the dimensionality of the data by extracting the most relevant variables for a more efficient representation.

Auto-encoders (AEs) are a type of neural network in deep learning specifically designed for unsupervised learning. Their operation can be simplified into three key steps (see Figure 2):

Encoding: The network compresses input data (e.g., a genomic dataset) into a smaller, compact representation, similar to compressing a large file.
Latent Layer: This layer stores the compressed version of the data, capturing its essence while filtering out noise and unnecessary details.
Decoding: The auto-encoder reconstructs the data from its compressed form, aiming to produce an output as close as possible to the original input. This allows the network to learn to retain the most important information.

In multi-omics analysis, auto-encoders distill complex and large datasets into more manageable information. For example, they can reduce thousands of genes measured in a genomic study to a smaller set of significant features, making analysis more efficient and revealing hidden patterns crucial for understanding diseases or predicting clinical outcomes. In summary, auto-encoders help transform large amounts of omics data into simpler and more informative formats, facilitating analysis and the extraction of valuable information.

Figure 2: Diagram of an AE (Auto-Encoder) neural network.

Practical application: predicting survival in breast cancer

In recent years, researchers have proposed using auto-encoders to predict survival from multi-omics data. The principle involves learning latent (hidden) representations from each dataset by processing them in separate layers. These latent representations then become new “deep” variables, which are easily concatenated or connected to feed into a new neural network for survival prediction. This principle is illustrated in Figure 2.

To demonstrate the practical application of AI in omics, the example of survival prediction in breast cancer can be considered. Breast cancer is a complex disease with multiple subtypes, each with a different prognosis. Using multi-omics data combined with AI offers valuable insights for treatment and predicting clinical outcomes. Diverse omics data have been collected from breast cancer patients, including genomic, transcriptomic, and proteomic data, representing a range of information from genetic sequences to protein expression levels.

These data are then processed using a deep learning model, such as an auto-encoder, to identify significant patterns and biomarkers related to survival rates. The model is trained to learn the complex relationships between the different types of data and patient survival. The analysis reveals specific biomarkers strongly correlated with favorable or unfavorable prognoses. For example, certain gene expressions or protein levels may be associated with an increased risk of recurrence.

These findings enable doctors to personalize treatments for patients. By identifying high-risk patients, more aggressive treatment strategies can be adopted, while low-risk patients can avoid unnecessarily heavy treatments. This approach is not limited to breast cancer and can be extended to other types of cancer or diseases, where combining multi-omics data with AI opens new avenues for more precise diagnoses and targeted treatments.

Limitations and challenges of AI in omics

While applying AI to omics data analysis presents revolutionary possibilities, it is important to recognize and discuss its limitations.

The first limitation relates to a lack of comprehension and understanding of AI models. For example, deep neural networks are often perceived as “black boxes” because they can produce accurate predictions without clear understanding of the underlying process. This opacity can be problematic, especially in medicine, where understanding mechanisms is crucial.

The second limitation is that the performance of AI models depends on the quality and quantity of data provided. Higher data quality leads to better model performance. In the context of omics, data can often be incomplete, biased, or of variable quality, leading to inaccurate or non-generalizable predictions.

Thirdly, the large scale of omics data and their integration pose significant challenges. While the multi-omics approach offers a comprehensive view of biological systems, it requires sophisticated strategies to overcome differences in scale, type, and quality of data, which can be a major technical hurdle.

Furthermore, the ethical and regulatory implications of using AI in omics must likewise be addressed: the use of AI in omics raises concerns about data privacy and patient consent. Regulatory implications must be proactively addressed to ensure responsible use of AI in research and clinical settings.

Lastly, the necessity for specific technical expertise is a final limiting factor. Effectively implementing AI in omics requires specialized expertise that not all researchers and clinicians possess, potentially limiting access to this technology.

By openly recognizing and discussing these limitations, the scientific community can better direct future efforts to improve AI applications in omics, with a focus on transparency, data quality, and ethics.

Conclusion

This article has highlighted the contributions of AI in predicting survival using multi-omics data. AI, particularly neural networks, offers a promising initial response to multi-omics analysis by enabling reduced and informative data representation. However, to maximize AI’s impact in this field, it is essential to overcome challenges related to model interpretability, data quality, and ethical considerations. By continuing research and improving current techniques, AI could significantly transform personalized medicine and biomedical research.

Need help?

Our data center experts are available to help you in the following areas:

Integration and analysis of multi-omics data (transformation and reduction of data dimensionality, application of machine and deep learning techniques, including autoencoders, to extract meaningful patterns).
Development of customized predictive models (creation of models adapted to clinical studies, validation and optimization of models)
Methodological and technical support (assistance in processing and integrating various omics data (genomics, transcriptomics, proteomics, metabolomics), training in machine and deep learning techniques).
Strategic support (Definition of analytical pipelines, answers to specific questions via our dedicated hotline for fast and accurate support).

Contact our experts to discuss your projects or get personalized support at onedt@efor-group.com.

Efor group

Discover Efor group

Our CSR commitments

Aware of our social and environmental responsibility, we act every day to make a positive impact on society.

Discover our commitments

Our news

Discover all our technical articles and news

See all