COMPARISON OF EMERGING METHODOLOGIES FOR DECONVOLUTION OF BULK GENETIC EXPRESSION
Since the advances in the technology of high-throughput sequencing (HTS), scientists are able to sequence several molecules at a time, having huge data sets of genes. Concretely, bulk RNA sequencing (bRNA-seq) measures average gene expression across a population of heterogeneous cells and single-cell RNA sequencing (scRNA-seq) examines gene expression patterns of individual cells. Tissue samples are still routinely processed in bulk due to the complexity of scRNa-seq methodology, which makes it critical to determine the cell-type composition of each sample and the gene expression profile (GEP) of each constituent cell type. By the process of deconvolution, this can be achieved, and concretely, computational deconvolution has a bright future in understanding the underlying mechanisms of many biological processes by its cell type composition. Since 2016, many statistical-based approaches have tried to solve this problem. In addition to all of them, currently, new deep-learning methods have appeared. Given these circumstances, the main objective of this end-of-degree project (EDP) is to compare the performance of the two more up-to-date methods of bRNA-seq deconvolution on different scenarios and with different types of data, in order to ensure that the evaluation is as thorough as possible. Accordingly, two algorithms are tested, the SCDC method based on weighted non-negative least squares (W-NNLS) and the Tissue-AdaPtive auto-Encoder (TAPE) method based on Deep Neural Networks (DNNs). They are both tested with two types of data, pseudo bulk with known solution, and real data of human pancreatic islets. For each experiment, different figures and metrics are chosen depending on the type of data, such as can be Concordance Correlation Coefficient (CCC), mean absolute error (MAE), absolute error plots, Bland-Altman plots, Wilcoxon signed-rank test and linear regression algorithms. After the completion of the EDP, it is concluded that, although SCDC shows better performance on pseudo bulk experiments and TAPE performs badly on them, TAPE is a more robust method that performs better for real data scenarios. However, it has been demonstrated that both methods still have some way to go in terms of deconvolution across protocols. Given the experiments conducted during this EDP, it is difficult to determine which method is better for deconvolution performance, since both of them have their advantages and disadvantages depending on the protocol, size and cell types of the data.
Trabajo Fin de Grado leído en la Universidad Rey Juan Carlos en el curso académico 2022/2023. Directores/as: José Luis Rojo Álvarez, Luis Bote Curiel
- Trabajos Fin de Grado