A big data approach to metagenomics for all-food-sequencing

dc.contributor.affiliationUniversidade de Santiago de Compostela. Centro de Investigación en Tecnoloxías da Informacióngl
dc.contributor.affiliationUniversidade de Santiago de Compostela. Departamento de Electrónica e Computacióngl
dc.contributor.areaÁrea de Enxeñaría e Arquitectura
dc.contributor.authorKobus, Robin
dc.contributor.authorAbuín Mosquera, José Manuel
dc.contributor.authorMüller, André
dc.contributor.authorHellmann, Sören Lukas
dc.contributor.authorPichel Campos, Juan Carlos
dc.contributor.authorFernández Pena, Anselmo Tomás
dc.contributor.authorHildebrandt, Andreas
dc.contributor.authorHankeln, Thomas
dc.contributor.authorSchmidt, Bertil
dc.date.accessioned2020-04-28T10:54:22Z
dc.date.available2020-04-28T10:54:22Z
dc.date.issued2020
dc.description.abstractBackground All-Food-Sequencing (AFS) is an untargeted metagenomic sequencing method that allows for the detection and quantification of food ingredients including animals, plants, and microbiota. While this approach avoids some of the shortcomings of targeted PCR-based methods, it requires the comparison of sequence reads to large collections of reference genomes. The steadily increasing amount of available reference genomes establishes the need for efficient big data approaches. Results We introduce an alignment-free k-mer based method for detection and quantification of species composition in food and other complex biological matters. It is orders-of-magnitude faster than our previous alignment-based AFS pipeline. In comparison to the established tools CLARK, Kraken2, and Kraken2+Bracken it is superior in terms of false-positive rate and quantification accuracy. Furthermore, the usage of an efficient database partitioning scheme allows for the processing of massive collections of reference genomes with reduced memory requirements on a workstation (AFS-MetaCache) or on a Spark-based compute cluster (MetaCacheSpark). Conclusions We present a fast yet accurate screening method for whole genome shotgun sequencing-based biosurveillance applications such as food testing. By relying on a big data approach it can scale efficiently towards large-scale collections of complex eukaryotic and bacterial reference genomes. AFS-MetaCache and MetaCacheSpark are suitable tools for broad-scale metagenomic screening applications. They are available at https://muellan.github.io/metacache/afs.html (C++ version for a workstation) and https://github.com/jmabuin/MetaCacheSpark (Spark version for big data clusters).gl
dc.description.peerreviewedSIgl
dc.description.sponsorshipThis work was partially supported by the Deutsche Forschungsgemeinschaft (DFG), Project HySim, the MINECO under award RTI2018-093336-B-C21, Xunta de Galicia under awards ED481B 2018/013 and ED431C 2018/19, the European Regional Development Fund, and by the Federal Office for Agriculture and Foodgl
dc.identifier.citationKobus, R., Abuín, J.M., Müller, A. et al. A big data approach to metagenomics for all-food-sequencing. BMC Bioinformatics 21, 102 (2020)gl
dc.identifier.doi10.1186/s12859-020-3429-6
dc.identifier.essn1471-2105
dc.identifier.urihttp://hdl.handle.net/10347/21839
dc.language.isoenggl
dc.publisherBMCgl
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-093336-B-C21/ES/TECNOLOGIAS PARA LA PREDICCION TEMPRANA DE SIGNOS RELACIONADOS CON TRASTORNOS PSICOLOGICOS
dc.relation.publisherversionhttps://doi.org/10.1186/s12859-020-3429-6gl
dc.rights©The Author(s) 2020. Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the datagl
dc.rights.accessRightsopen accessgl
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectNext-generation sequencinggl
dc.subjectMetagenomicsgl
dc.subjectSpecies identificationgl
dc.subjectEukaryotic genomesgl
dc.subjectLocality sensitive hashinggl
dc.subjectBigdatagl
dc.titleA big data approach to metagenomics for all-food-sequencinggl
dc.typejournal articlegl
dc.type.hasVersionVoRgl
dspace.entity.typePublication
relation.isAuthorOfPublication9ae70b97-c52b-415b-b4aa-0e8a7ff70d4c
relation.isAuthorOfPublicationdb334853-753e-4afc-9f4f-ad847d0353a7
relation.isAuthorOfPublicationdecb372f-b9cd-4237-8dda-2c0f5c40acbe
relation.isAuthorOfPublication.latestForDiscovery9ae70b97-c52b-415b-b4aa-0e8a7ff70d4c

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2020_bmc_kobus_big.pdf
Size:
1.04 MB
Format:
Adobe Portable Document Format
Description: