search this blog

Monday, September 3, 2012

Next-generation sequence data suggests a "rapid" and "extreme" expansion of R1b across Europe during the Neolithic

Note the words used in these abstracts, referring to the spread of R1b as "rapid" and "extreme". This is important, because the fact that this was an explosive event probably explains why R1b hasn't yet been found in any ancient DNA samples from Europe until the late Neolithic. In other words, to find it in Europe before or even during its early expansion, we need to test very specific remains, belonging to cultures that facilitated this expansion.

Y-chromosomal insights from large-scale resequencing

Tyler-Smith C, Wei W, Ayub Q, Chen Y, Jostins L, McCarthy S, Hou Y, Carbone I, Durbin R, Xue Y

Next-generation sequencing technology now makes it possible to resequence whole genomes or targeted regions on a population scale, providing extensive sequence data from the Y chromosome. Coverage of the Y chromosome is lower than that of autosomes, and repeated sequences complicate mapping of reads to their correct location, but about 10 Mb of unique Y sequence is accessible to current technologies. We have explored the insights that can be obtained from two such datasets. Complete Genomics have released high coverage sequences of 35 diverse males (, which we supplemented by sequencing an additional male belonging to haplogroup A. From these sequences, we identified about 6.6 thousand Y variants, which showed high validation rates. These variants were used to construct a maximum parsimony phylogenetic tree that recapitulated the known phylogeny and distinguished all individuals. Using a measured SNP mutation rate of 1x10-9 per bp per year, the ages of nodes of interest could be estimated. The TMRCA of the entire tree was ~115 KYA (thousand years ago), and of the lineages outside Africa ~60 KYA, both as expected. Additional insights included a rapid expansion of hg F ~40 KYA, and of R1b in Europe ~5-10 KYA. The archaeological counterpart of the former is unclear, but the latter is likely to represent a Neolithic expansion of this lineage. The second dataset consisted of low-coverage (~2x) sequence of 525 diverse males from the 1000 Genomes Project ( About 18.7 thousand Y-SNPs were called, >98% of which validated, but the callset missed ~17% of SNPs because of the low coverage. A maximum likelihood tree was constructed that again recapitulated and refined the known phylogeny and distinguished all individuals. The expansions noted above were also seen, although estimating times was more complex because of the missing variants. These explorations of large-scale Y resequencing illustrate the power and limitations of current technologies and also the need for the community to develop efficient ways to use such large datasets, including a nomenclature compatible with complete lineage resolution.

A calibrated human Y-chromosomal phylogeny based on resequencing

Wei W, Ayub Q, Chen Y, McCarthy S, Hou Y, Carbone I, Xue Y, Tyler-Smith C

We have analysed a dataset of 36 complete Y-chromosomal sequences, 35 released by Complete Genomics ( and an additional sequence from a haplogroup A3b individual, in order to explore how effectively complete sequence data from the Y chromosome can be used to construct and calibrate a phylogeny. We identified unique-sequence regions of the chromosome where we expected variant identification from next-generation sequence data to be reliable, and developed additional filtering steps for the data. Validation rates of the resulting filtered genotype calls were >99%. In total, we identified 5,865 SNPs, 741 indels and 56 MNPs. 4,861 of the variants are new and 262 of them are recurrent even in this small sample. We constructed parsimony-based phylogenetic trees using PHYLIP incorporating all or different subsets of the variants, and estimated times for the entire tree and different clades of interest using GENETREE or the rho measure. The tree structure was consistent with literature data. The GENETREE TMRCA for the complete set of chromosomes examined was 105-125 KYA; times for the out-of-Africa movement were 62-79 KYA, a Paleolithic expansion 37-48 KYA, and the expansion of R1b in Europe 7-10 KYA; rho times were broadly similar. Our study identifies vast numbers of new variants, and explores the methodological steps necessary to obtain reliable biological insights from current next-generation sequence data. It also poses challenges such as how to develop a nomenclature system that can accommodate such extensive sequence information, or how to identify the archaeological counterparts of the male expansions detected.

Insight into human Y chromosome variation from low-coverage whole-genome resequencing data

Xue Y, Chen Y, McCarthy S, Ayub Q, Jostins L, Durbin R, Tyler-Smith C

Phase 1 of the 1000 Genomes Project has generated low-coverage whole-genome sequence data from 1,094 individuals from worldwide populations, including 528 males. SNP calls on the Y chromosome were made using SAMtools. In low coverage data, there are errors and uncertainty in the genotype calls. We developed a filtering strategy to reduce these, including restricting the analysis to 8.9 Mb of Y unique regions. We called a total of 18,692 Y-SNPs, 16,679 with the ancestral allele known. The false negative rate and false positive variant site identification rates were measured at 14% and 1.72% respectively by comparison with Complete Genomics calls on an overlapping subset of samples. The genotype accuracy was 97.4% compared with HapMap3 chip genotypes and 96.6% compared with Complete Genomics sequences. Using known literature variants, we assigned each sample to a haplogroup and these samples covered most of the major lineages except F, K, L, and M. A phylogenetic tree was constructed based on all the sites with known ancestral states using the RAxML-VI-HPC: Maximum Likelihood-based Phylogenetic Analysis. The tree was consistent with the established structure. It confirmed Hg E (Bantu), O (China) and R1b (Europe) expansions associated with the Neolithic transitions in different parts of the world, and revealed that the expansion in Europe was the most extreme. One novel finding was a striking expansion of lineages F to R ~20 thousand years after the out-of-Africa movement, suggesting a previously unknown event of importance to male demography at this time.


DNA in Forensics 2012, Final Program & Abstracts