RT Journal Article T1 A big data approach to metagenomics for all-food-sequencing A1 Kobus, Robin A1 Abuín Mosquera, José Manuel A1 Müller, André A1 Hellmann, Sören Lukas A1 Pichel Campos, Juan Carlos A1 Fernández Pena, Anselmo Tomás A1 Hildebrandt, Andreas A1 Hankeln, Thomas A1 Schmidt, Bertil K1 Next-generation sequencing K1 Metagenomics K1 Species identification K1 Eukaryotic genomes K1 Locality sensitive hashing K1 Bigdata AB BackgroundAll-Food-Sequencing (AFS) is an untargeted metagenomic sequencing method that allows for the detection and quantification of food ingredients including animals, plants, and microbiota. While this approach avoids some of the shortcomings of targeted PCR-based methods, it requires the comparison of sequence reads to large collections of reference genomes. The steadily increasing amount of available reference genomes establishes the need for efficient big data approaches.ResultsWe introduce an alignment-free k-mer based method for detection and quantification of species composition in food and other complex biological matters. It is orders-of-magnitude faster than our previous alignment-based AFS pipeline. In comparison to the established tools CLARK, Kraken2, and Kraken2+Bracken it is superior in terms of false-positive rate and quantification accuracy. Furthermore, the usage of an efficient database partitioning scheme allows for the processing of massive collections of reference genomes with reduced memory requirements on a workstation (AFS-MetaCache) or on a Spark-based compute cluster (MetaCacheSpark).ConclusionsWe present a fast yet accurate screening method for whole genome shotgun sequencing-based biosurveillance applications such as food testing. By relying on a big data approach it can scale efficiently towards large-scale collections of complex eukaryotic and bacterial reference genomes. AFS-MetaCache and MetaCacheSpark are suitable tools for broad-scale metagenomic screening applications. They are available at https://muellan.github.io/metacache/afs.html (C++ version for a workstation) and https://github.com/jmabuin/MetaCacheSpark (Spark version for big data clusters). PB BMC YR 2020 FD 2020 LK http://hdl.handle.net/10347/21839 UL http://hdl.handle.net/10347/21839 LA eng NO Kobus, R., Abuín, J.M., Müller, A. et al. A big data approach to metagenomics for all-food-sequencing. BMC Bioinformatics 21, 102 (2020) NO This work was partially supported by the Deutsche Forschungsgemeinschaft (DFG), Project HySim, the MINECO under award RTI2018-093336-B-C21, Xunta de Galicia under awards ED481B 2018/013 and ED431C 2018/19, the European Regional Development Fund, and by the Federal Office for Agriculture and Food DS Minerva RD 26 abr 2026