Logo CICANCER

Development of a bioinformatics algorithm to identify gene markers associated with survival and application to breast cancer and colorectal cancer.

Development of a bioinformatics algorithm to identify gene markers associated with survival and application to breast cancer and colorectal cancer.

Alberto Berral González

Bioinformatics and Functional Genomics Group, CIC (USAL-CSIC)

Date: 07/11/2024
Time: 12:30
CIC Hall Lecture
Host: Javier De las Rivas

The burden of cancer in Europe is increasing annually, with 2.7 million new cases and 1.3 million deaths currently estimated, and 3.25 million cases projected by 2040. Breast, colorectal, cervical, prostate, lung, and gastric cancers account for more than half of all new cases and deaths. Spain ranks 4th in Europe for cancer cases, with a projected 30% increase in incidence by 2040. The most common cancers in Spain are breast (BRCA) and colorectal cancer (CRC) in women and prostate and colorectal cancer in men. Despite this, Spain has one of the lowest cancer mortality rates in the EU. Analysis of cancer data, especially for colorectal and breast cancer, is essential due to their high incidence and mortality.

Survival analysis is a statistical method used to predict the time until an event, such as a patient's recurrence or death, occurs. It is widely used in cancer research to classify patients based on survival outcomes. Non-parametric methods, such as the Kaplan-Meier approach, are commonly used to estimate survival curves from censored data, and the log-rank test compares these curves between groups. This method is critical for identifying genes or signatures associated with improved survival, which helps to discover new therapeutic targets to improve patient outcomes.

In the case of BRCA, clinical biomarkers and commercial platforms are available for risk analysis, but integration with a broader range of biomarkers could provide better results. For CRC, molecular subtypes and biomarkers such as microRNAs and gene expression profiles have been proposed that can provide valuable prognostic information and guide therapeutic decisions. However, further research is needed to validate these biomarkers in clinical practice and to develop methods to find more robust biomarkers that could improve the risk prediction of the existing biomarkers.

In this context, the present study explores the implementation of different functions to perform survival analysis, risk prediction and risk-based patient stratification. These functions are integrated into an algorithm with three parts: (I) selection of genes related to a clinical variable, (II) robust assessment of a gene's ability to mark survival, and (III) prediction of patient risk given a set of informative genes, all implemented in R functions supporting genomic expression data. Furthermore, this algorithm has been applied to large cancer datasets of BRCA and CRC to estimate cancer patients' risk and find gene markers of survival.