Saturday, March 5, 2016

SNP Tsunami Continues Into Third Year

Many of us spend a great deal of time, energy and money attempting to document that a particular ancestor of ours belonged to a particular tribe or ethnic group. We all get very excited when we find a family Bible or a diary of an ancestor that dates back two or three hundred years.

Don’t you wish your ancestors had carried a passport which got stamped at every branching point of their intercontinental migration route as they trudged through prehistory? Actually they did. In some cases our genomes have recorded more than a hundred thousand years of travel.

This travel is documented in the mitochondrial DNA of all of us. A separate and more detailed path is documented in the yDNA of men. Many call this anthropology. In Chapter 6 of my most recent book, NextGen Genealogy: The DNA Connection, I call this extreme genealogy. In either case it is the study of haplogroups – or the ancient clans to which our ancestors belonged.

Women ancestors were somewhat limited in what they could communicate to distant descendants because our mitochondrial DNA (mtDNA) contains only 16,569 locations in which they can record the presence of one of the four chemical bases that make up our DNA. Their paths through prehistory can be traced for our female lines using mtDNA test results. mtDNA was the basis for Bryan Sykes’ pioneering Seven Daughters of Eve.

Our men ancestors had tens of millions of additional locations where such information could be logged. What we look for today is where on our genomes these ySNPs occurred in this transcribed travel record. Once such a permanent change has occurred, it is passed down to all male descendants.

What are ySNPs and how do they differ from the ySTRs we have been testing since 2000?

Short Tandem Repeat (STR)
Pronounced "stir." This is a repeating pattern of genetic code letters at a location on the genome. The value is the number of times that pattern is repeated at that location.
Single Nucleotide Polymorphism (SNP)
Pronounced "snip." A single and permanent change in the DNA bases at a given location.

Consumer DNA testing to discover family history information began in 2000 with the focus on the Y chromosome (yDNA) which only males possess. Mitochondrial DNA testing for both genders soon followed but is somewhat limited because it has ONLY 16,569 locations to store a single bit of information. By 2010 autosomal DNA testing burst on to the scene and has become the most popular test.

By 2013 a new testing cycle for yDNA became available to genealogists. While the previous cycle had focused on testing ySTRs, the new wave examines ySNPs.

However, yDNA can record 3,500 times the data that mtDNA can. Therefore, it has the power to record a much more detailed migratory history.
Most of yDNA testing to date has been conducted on Short Tandem Repeats (ySTRs). When we talk about 12, 25, 37, 67 and 111 marker tests, we are referring to how many ySTRs were tested. STR testing is analogous to dispatching a census taker to a village which is known to have 12, 25… residences. In our scenario the locations of these residences have been defined by geneticists as being accessible and having a rate of mutation that is somewhat predictable. At each location our census taker records how many STRs are currently in residence.

In NextGen testing the focus shifts to Single Nucleotide Polymorphisms (SNPs). Instead of dispatching probes to specific, predefined locations, NextGen ySNP testing is more analogous to take satellite images along the entire Y-chromosome. Although the chromosome contains almost sixty million identifiable locations, current technology allows us to get reliable data from only about a fourth of those locations. Still this is an overwhelming amount of data. The computing power to analyze it has only recently become available.

At present ySNP chasing is only in its infancy. A vast majority of the SNPs we know today have been discovered in the last two years. The statistics in the chart below represent the number that had been placed on the International Society of Genetic Genealogists (ISOGG) yTree committee chaired by Alice Fairhurst:

Cumulative SNPs placed on the ISOGG yTree

Another way to look at this SNP tsunami is to view the new SNPs identified in a two year period (2013-2015) for R1b-L21, the most common male haplogroup in Western Europe today:

Known SNPs in R-L21 haplogroup in mid-2013 (Mike Walsh)

Known SNPs in R-L21 haplogroup in mid-2015 (Mike Walsh)

We are still working to find the exact location and sequence for many of them. In some ways our knowledge today would be like getting a SNP passport with several dozen “check point” stamps on it but in random order. We know that our genomes passed through all those points but are still trying to decipher in what sequence that journey occurred. The charts above for R1b-L21 represent ySNPs that we have been able to arrange in evolutional order. As more men are tested and we can document where they exited the main SNP trail, we can refine our chronology for all of us.

The chart below for sub-clade R-1026 is an expansion of the seven pale pink SNPs clustered at the bottom of the chart above. This subclade was unknown when the previous chart was drawn in 2013.

Courtesy of Alex Williamson -- 

Even with this deluge there are many more thousands of SNPs to come. The NextGen curve is where the ySTR was in 2003 when 10,000 tests had been sold by FTDNA. Ironically, that is the number of BIG Y tests Bennett Greenspan reports FTDNA has sold to date. Full Genomes report their company has sold 1,500 NextGen tests. 

Most of the ySNPs that have been discovered have yet to be specifically placed and more will be discovered as testing numbers increase. The entire recently discovered R-S1026 haplogroup above is not yet integrated into the ISOGG ytree. It is only partially integrated into the FTDNA ytree. The R-S1026 chart contains many blocks or boxes that group newly discovered SNPs. At this point we believe we have the blocks in the correct chronological order of their appearance. However, we have yet to sort the SNPs within boxes into their correct order of appearance. And more remain to be discovered. Other haplogroups are in a similar state of discovery and growth. The SNP tsunami shows no sign of receding anytime soon. 


  1. Nomenclature is a real problem. As we get further down the tree it is going to become impossible to deal with all these SNPs, it is bad enough at the top end of L21. The current random system using different testing companies all doing their own thing is just inadequate.

  2. Joe,
    Brian Swann uses the term "leaves" to describe that part of the branches of our trees that you describe as getting further down the tree. That's where as a genealogist I get the most excited. I'd like to time this chart into my paper pedigree.

    Seven of us are below the square big box in the middle of Alex's chart for R-S1026 above. We have matched for years on the more advanced ySTR tests. Some of us are as much as 10 markers apart on 111 ySTR tests. We decided to do the BIG Y together to see what we could learn about our common journey. We marched out of prehistory together. Our paths began to diverge as we reached the time that surnames were beginning to come into common use.

    Three of the seven of us are known to descend from my 6th great-grandfather -- each from a different one of his sons. The three of us share 5 additional SNPs below that big square box as we get closer to the present. We have a fellow traveler who may be an NPE (or perhaps we are the NPEs). We know that his family and ours were associated in the 17th and 18th centuries. The SNPs are telling us more about that association than we could tell by the STRs alone. But there is much more to learn.

  3. "In some cases our genomes have recorded more than a hundred thousand years of travel." Truly they are, and this is interesting when you follow the view to think about yourself and your family.

    pmma polymer