Saturday, October 25, 2014

STiRring the SNP Pot

Most of us, who have been interested in genetic genealogy for more than 4 years, got our start by trying to understand the STRs (Short Tandem Repeats) of yDNA. For those of you who are new to this field, STRs are what is counted at various specified locations along Chromosome Y to generate the numbers on 12 markers, 25 markers, ... and 111 markers test results.

In the last few years our attention has been drawn to the cMs (centiMorgans) of matching segments of the large numbers of atDNA kits that have been tested by 23andMe, FTDNA and Ancestry.

Unfortunately, Full Mitochondrial Sequence test results have yet to reach the critical mass necessary to make mtDNA genealogically relevant to many of us. As the number of individuals tested continues to grow, this test will have genealogical relevance for more of us.

In 2014 the first wave of the SNP Tsunami engulfed us as results from Full Y, BIG Y and Chromo2, among others, began to come back in greatly increasing numbers. The mechanisms for organizing the newly discovered SNPs (pronounced "snips") could not begin to keep up. The FTDNA SNP tree currently lists my most recent SNP as R1b-L21 even though I had tested positive for DF13 (the next level down) in their lab in June, 2012. DF13 does not yet show up of FTDNA's SNP Tree even though several hundred customers have tested positive for it. Only SNPs known by November, 2013 and mostly those on the chip of National Geographic's Geno 2.0 test have so far been incorporated. As a result none of the SNPs discovered in the last year are listed in Y-DNA Haplotree currently posted on FTDNA's website

Even ScotlandsDNA, the lab which discovered and named my own subclade of S1026, has not figured out much about what it is:
Your S1026 subtype was recently discovered using Chromo2, so its distribution is not yet understood. You may carry markers that further define your subtype, but do not yet appear on our tree. You will find these in your genetic signature.
The ISOGG yTree is trying to keep up but is woefully behind where the SNPs are daily being identified as they wash ashore in the haplogroup discover projects. This tree which is relied on by academics and hobbyists alike to document the descent of "man" from yAdam to the present had identified and placed on its tree a total of 3,610 SNPs from 2006 to about this time last year. Since January of 2014 alone more than 10,000 additional SNPs have been added. And there is no end in sight. Tens of thousands more are in the process of being identified and placed by citizen scientists. 

NextGen sequencing has identified them. The harder job is to assign each of them to the correct haplogroup and to arrange them in the correct chronological order. Many more men need to be tested before this process can near completion.

Men wishing to learn more about their deep ancestry and those who wish to build bridges from their deep ancestry to their ancestral trail into genealogical time cannot rely on the yTrees of either FTDNA tree or ISOGG. These are too far behind the trail blazers. Instead the strategy that seems to be working is to seek out a man who has already taken a NextGen test AND who shows up as a match for them on a ySTR test. A match within 10 markers on a 111 marker ySTR test is likely to be a fellow member of one's subclade just beyond genealogical time. Mismatches of 7 or less on a 67 ySTR test are also good candidates.

SNP R1b-S1026 was discovered just below L21 and DF13 by ScotlandsDNA's Chromo2 service shortly before the BIG Y results started coming back.

It is at the very bottom and slightly right of center in the diagram above. At the moment it is represented by 4 pale rose colored boxes. This is my subclade. We are attempting to expand this group down into genealogical times and we are getting close. 

Recently two men who previously had not SNP tested were tested for newly discovered Z16891. For those of you who are trying to keep score, Z16891 is the rightmost pale rose SNP on the very bottom row in Mike Walsh's excellent chart above. These men were single SNP tested using the older Sanger technology at FTDNA. ySeq also offers the same test. These men chose to be tested for Z16891 because they had close STR matches with men who had taken part in the discovery of Z16891 as part of BIG Y. In both cases the men tested positive. These positive tests allowed these two fellow travelers to document their journey down the SNP flow from yAdam to SNP Z16891. 

This process provided a very inexpensive option compared to the first class ticket for the BIG Y. It is also less helpful to the discovery process. These men also did not get a list of SNPs below Z16891 that could turn out to be terminal SNPs that uniquely identify their specific families.

It is hoped that the testing panels now being developed will be another avenue for more men to get involved in SNP testing -- something between the vast BIG Y and the narrowly focused individual SNP tests. If those come on line over the next few months, we then will be looking at ways to test potentially terminal SNPs for individual families that may become the 21st century equivalent of 17th century coats of arms. 


  1. From my experience I had a 64/67 marker match that deep clade tested for L21. With that information I looked at the L21 spreadsheet and figured we were probably DF21. I was right then my 67 marker match and I took turns testing downstream markers.

  2. Yes, that is one strategy. You choose a way to use teamwork to share the cost of exploration.

    Another way is to pool your funds and have one person take a next generation test like BIG Y. This strategy would cover many potential SNPs at once rather than pay for them one at at time. It would also lead to the discovery of newer SNPs that may lead into genealogical time.--even unique to a specific family.

    The method I described in the post above I suppose could be called the catapult approach in which one identifies a close STR match who has done a NextGen test and tests for a SNP in the most downstream cluster he has discovered.

    A single solution may not be best for all cases. It will be interesting to see how the targeted panels, currently under development will work out on both a cost and benefit basis.