Estimating Haplotype Frequency and Coverage of Databases

dc.contributor.affiliationUniversidade de Santiago de Compostela. Departamento de Ciencias Forenses, Anatomía Patolóxica, Xinecoloxía e Obstetricia, e Pediatríagl
dc.contributor.authorEgeland, Thore
dc.contributor.authorSalas Ellacuriaga, Antonio
dc.date.accessioned2020-06-06T20:02:54Z
dc.date.available2020-06-06T20:02:54Z
dc.date.issued2008
dc.description.abstractA variety of forensic, population, and disease studies are based on haploid DNA (e.g. mitochondrial DNA or Y-chromosome data). For any set of genetic markers databases of conventional size will normally contain only a fraction of all haplotypes. For several applications, reliable estimates of haplotype frequencies, the total number of haplotypes and coverage of the database (the probability that the next random haplotype is contained in the database) will be useful. We propose different approaches to the problem based on classical methods as well as new applications of Principal Component Analysis (PCA). We also discuss previous proposals based on saturation curves. Several conclusions can be inferred from simulated and real data. First, classical estimates of the fraction of unseen haplotypes can be seriously biased. Second, there is no obvious way to decide on required sample size based on traditional approaches. Methods based on testing of hypotheses or length of confidence intervals may appear artificial since no single test or parameter stands out as particularly relevant. Rather the coverage may be more relevant since it indicates the percentage of different haplotypes that are contained in a database; if the coverage is low, there is a considerable chance that the next haplotype to be observed does not appear in the database and this indicates that the database needs to be expanded. Finally, freeware and example data sets accompany the methods discussed in this paper: http://folk.uio.no/thoree/nhap/.gl
dc.description.peerreviewedSIgl
dc.description.sponsorshipTwo grants from the Fundación de Investigación Médica Mutua Madrileña awarded to AS partially supported this projectgl
dc.identifier.citationEgeland T, Salas A (2008) Estimating Haplotype Frequency and Coverage of Databases. PLoS ONE 3(12): e3988. https://doi.org/10.1371/journal.pone.0003988gl
dc.identifier.doi10.1371/journal.pone.0003988
dc.identifier.essn1932-6203
dc.identifier.urihttp://hdl.handle.net/10347/22890
dc.language.isoenggl
dc.publisherPlosgl
dc.relation.publisherversionhttps://doi.org/10.1371/journal.pone.0003988gl
dc.rightsCopyright: © 2008 Egeland et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are creditedgl
dc.rights.accessRightsopen accessgl
dc.rights.urihttps://creativecommons.org/licenses/by/2.0/
dc.titleEstimating Haplotype Frequency and Coverage of Databasesgl
dc.typejournal articlegl
dc.type.hasVersionVoRgl
dspace.entity.typePublication
relation.isAuthorOfPublication2badffc8-442d-4308-ab23-2eafbb77f6ba
relation.isAuthorOfPublication.latestForDiscovery2badffc8-442d-4308-ab23-2eafbb77f6ba

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2008_plosone_egeland_haplotype.PDF
Size:
182.18 KB
Format:
Adobe Portable Document Format
Description: