RT Journal Article T1 Estimating Haplotype Frequency and Coverage of Databases A1 Egeland, Thore A1 Salas Ellacuriaga, Antonio AB A variety of forensic, population, and disease studies are based on haploid DNA (e.g. mitochondrial DNA or Y-chromosomedata). For any set of genetic markers databases of conventional size will normally contain only a fraction of all haplotypes.For several applications, reliable estimates of haplotype frequencies, the total number of haplotypes and coverage of thedatabase (the probability that the next random haplotype is contained in the database) will be useful. We propose differentapproaches to the problem based on classical methods as well as new applications of Principal Component Analysis (PCA).We also discuss previous proposals based on saturation curves. Several conclusions can be inferred from simulated and realdata. First, classical estimates of the fraction of unseen haplotypes can be seriously biased. Second, there is no obvious wayto decide on required sample size based on traditional approaches. Methods based on testing of hypotheses or length ofconfidence intervals may appear artificial since no single test or parameter stands out as particularly relevant. Rather thecoverage may be more relevant since it indicates the percentage of different haplotypes that are contained in a database; ifthe coverage is low, there is a considerable chance that the next haplotype to be observed does not appear in the databaseand this indicates that the database needs to be expanded. Finally, freeware and example data sets accompany themethods discussed in this paper: http://folk.uio.no/thoree/nhap/. PB Plos YR 2008 FD 2008 LK http://hdl.handle.net/10347/22890 UL http://hdl.handle.net/10347/22890 LA eng NO Egeland T, Salas A (2008) Estimating Haplotype Frequency and Coverage of Databases. PLoS ONE 3(12): e3988. https://doi.org/10.1371/journal.pone.0003988 NO Two grants from the Fundación de Investigación Médica Mutua Madrileña awarded to AS partially supported this project DS Minerva RD 24 abr 2026