Examinando por Autor "Claros, Gonzalo"

Mostrando 1 - 2 de 2

Obtaining the Most Accurate de novo Transcriptomes for Non-model Organisms: The Case of Castanea sativa
(SpringerLink, 2017) Espigares, Marina; Seoane, Pedro; Bautista, Rocío; Quintana, Julia; Claros, Gonzalo; Gomez, Luis
Gene expression analyses of non-model organisms must start with the construction of a high accurate de novo transcriptome as a reference. The best way to determine the suitability of any de novo transcriptome assembling is its comparison with other well-known “reference” transcriptomes. In this study, we took six complete plant transcriptomes (Arabidopsis thaliana, Vitis vinifera, Zea mays, Populus trichocarpa, Triticum aestivum and Oryza sativa) and compared all of them using a series of metrics system for a principal component analysis, resulting that A. thaliana and P. trichocarpa were the best references. This has been automated using AutoFlow. A primary assembly of short reads from Illumina Platform (50 nt, single reads) and long reads from Roche-454 technology from Castanea sativa was performed individually using k-mers from 25 to 35 and different assemblers (Oases v2, SOAPdenovoTrans, RAY, MIRA4 and MINIMUS). The resulting contigs were then reconciled with the aim of obtaining the best transcriptome. Oases and SOAP were used for the assembling of short reads, MIRA and MINIMUS for the assembling of long reads or the reconciliations, and RAY, that can compute de novo transcript assembling from heterogeneous (long and short reads) next-generation sequencing data, was included to avoid the reconciliation step. A total of 90 different assemblies were generated in a single run of the pipeline. A hierarchical clustering on the PCA components (HCPC) was implemented to automatically identify the best assembling strategies based on the shortest distance in HCPC to the two plant reference transcriptomes is selected. In this approach, reconciliation of Roche/454 long reads with Illumina contigs produce more complete and accurate gene reconstructions than other combinations. Surprisingly, reconstructions based only on Illumina and the ones creates with RAY seem to be less accurate. For this specific study, the most complete and accurate transcriptome corresponds to the Illumina contigs obtained with SOAPdenovoTrans and reassembled with 454 long reads using MIRA4. This is only a one example of a transcriptome building. Many other assembling can be performed just changing parameters, k-mers, sequencing technology, assemblers, reference organisms, etc. The pipeline in AutoFlow is easily customizable for those purposes.
TransFlow: a modular framework for assembling and assessing accurate de novo transcriptomes in non-model organisms
(BMC, 2018) Seoane, Pedro; Espigares, Marina; Carmona, Rosario; Polonio, Alvaro; Quintana, Julia; Cretazzo, Enrico; Bota, Josefina; Perez-Garcia, Alejandro; de Dios Alche, Juan; Gomez, Luis; Claros, Gonzalo
Background The advances in high-throughput sequencing technologies are allowing more and more de novo assembling of transcriptomes from many new organisms. Some degree of automation and evaluation is required to warrant reproducibility, repetitivity and the selection of the best possible transcriptome. Workflows and pipelines are becoming an absolute requirement for such a purpose, but the issue of assembling evaluation for de novo transcriptomes in organisms lacking a sequenced genome remains unsolved. An automated, reproducible and flexible framework called TransFlow to accomplish this task is described. Results TransFlow with its five independent modules was designed to build different workflows depending on the nature of the original reads. This architecture enables different combinations of Illumina and Roche/454 sequencing data, and can be extended to other sequencing platforms. Its capabilities are illustrated with the selection of reliable plant reference transcriptomes and the assembling six transcriptomes (three case studies for grapevine leaves, olive tree pollen, and chestnut stem, and other three for haustorium, epiphytic structures and their combination for the phytopathogenic fungus Podosphaera xanthii). Arabidopsis and poplar transcriptomes revealed to be the best references. A common result regarding de novo assemblies is that Illumina paired-end reads of 100 nt in length assembled with OASES can provide reliable transcriptomes, while the contribution of longer reads is noticeable only when they complement a set of short, single-reads. Conclusions TransFlow can handle up to 181 different assembling strategies. Evaluation based on principal component analyses allows its self-adaptation to different sets of reads to provide a suitable transcriptome for each combination of reads and assemblers. As a result, each case study has its own behaviour, prioritises evaluation parameters, and gives an objective and automated way for detecting the best transcriptome within a pool of them. Sequencing data type and quantity (preferably several hundred millions of 2×100 nt or longer), assemblers (OASES for Illumina, MIRA4 and EULER-SR reconciled with CAP3 for Roche/454) and strategy (preferably scaffolding with OASES, and probably merging with Roche/454 when available) arise as the most impacting factors.