Obtaining the Most Accurate de novo Transcriptomes for Non-model Organisms: The Case of Castanea sativa

Espigares, Marina; Seoane, Pedro; Bautista, Rocío; Quintana, Julia; Claros, Gonzalo; Gomez, Luis

Obtaining the Most Accurate de novo Transcriptomes for Non-model Organisms: The Case of Castanea sativa

Archivos

2017_Espigares.pdf (3.39 MB)

Fecha

2017

Autores

Editor

SpringerLink

URI

https://hdl.handle.net/10115/29201

DOI

https://doi.org/10.1007/978-3-319-56154-7_44

Citas

0 citas en

Resumen

Gene expression analyses of non-model organisms must start with the construction of a high accurate de novo transcriptome as a reference. The best way to determine the suitability of any de novo transcriptome assembling is its comparison with other well-known “reference” transcriptomes. In this study, we took six complete plant transcriptomes (Arabidopsis thaliana, Vitis vinifera, Zea mays, Populus trichocarpa, Triticum aestivum and Oryza sativa) and compared all of them using a series of metrics system for a principal component analysis, resulting that A. thaliana and P. trichocarpa were the best references. This has been automated using AutoFlow. A primary assembly of short reads from Illumina Platform (50 nt, single reads) and long reads from Roche-454 technology from Castanea sativa was performed individually using k-mers from 25 to 35 and different assemblers (Oases v2, SOAPdenovoTrans, RAY, MIRA4 and MINIMUS). The resulting contigs were then reconciled with the aim of obtaining the best transcriptome. Oases and SOAP were used for the assembling of short reads, MIRA and MINIMUS for the assembling of long reads or the reconciliations, and RAY, that can compute de novo transcript assembling from heterogeneous (long and short reads) next-generation sequencing data, was included to avoid the reconciliation step. A total of 90 different assemblies were generated in a single run of the pipeline. A hierarchical clustering on the PCA components (HCPC) was implemented to automatically identify the best assembling strategies based on the shortest distance in HCPC to the two plant reference transcriptomes is selected. In this approach, reconciliation of Roche/454 long reads with Illumina contigs produce more complete and accurate gene reconstructions than other combinations. Surprisingly, reconstructions based only on Illumina and the ones creates with RAY seem to be less accurate. For this specific study, the most complete and accurate transcriptome corresponds to the Illumina contigs obtained with SOAPdenovoTrans and reassembled with 454 long reads using MIRA4. This is only a one example of a transcriptome building. Many other assembling can be performed just changing parameters, k-mers, sequencing technology, assemblers, reference organisms, etc. The pipeline in AutoFlow is easily customizable for those purposes.

Descripción

This work has been supported by co-funding from the ERDF (European Regional Development Fund) 2014-2020 “Programa Operativo de Crecimiento Inteligente” to the grant RTA2013-00068-C03 of the Spanish INIA and MINECO. The authors also thankfully acknowledge the computer resources and the technical support provided by the Plataforma Andaluza de Bioinformática of the University of Málaga.

Citación

Espigares, M., Seoane, P., Bautista, R., Quintana, J., Gómez, L., Claros, M.G. (2017). Obtaining the Most Accurate de novo Transcriptomes for Non-model Organisms: The Case of Castanea sativa . In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2017. Lecture Notes in Computer Science(), vol 10209. Springer, Cham. https://doi.org/10.1007/978-3-319-56154-7_44

Colecciones

Comunicaciones a Congresos

Página completa del ítem

Obtaining the Most Accurate de novo Transcriptomes for Non-model Organisms: The Case of Castanea sativa

Archivos

Fecha

Autores

Título de la revista

ISSN de la revista

Título del volumen

Editor

Enlace externo

URI

DOI

Citas

Resumen

Descripción

Palabras clave

Citación

Colecciones