Many of us spend a great deal of time, energy and money
attempting to document that a particular ancestor of ours belonged to a
particular tribe or ethnic group. We all get very excited when we find a family
Bible or a diary of an ancestor that dates back two or three hundred years.
Don’t you wish your ancestors had carried a passport which
got stamped at every branching point of their intercontinental migration route
as they trudged through prehistory? Actually they did. In some cases our
genomes have recorded more than a hundred thousand years of travel.
This travel is documented in the mitochondrial DNA of all of
us. A separate and more detailed path is documented in the yDNA of men. Many
call this anthropology. In Chapter 6 of my most recent book, NextGen Genealogy: The DNA Connection,
I call this extreme genealogy. In
either case it is the study of haplogroups – or the ancient clans to which our
ancestors belonged.
Women ancestors were somewhat limited in what they could
communicate to distant descendants because our mitochondrial DNA (mtDNA) contains only 16,569
locations in which they can record the presence of one of the four chemical
bases that make up our DNA. Their paths
through prehistory can be traced for our female lines using mtDNA test results.
mtDNA was the basis for Bryan Sykes’ pioneering Seven Daughters of Eve.
Our men ancestors had tens of
millions of additional locations where such
information could be logged. What we look for today is where on our genomes
these ySNPs occurred in this transcribed travel record. Once such a permanent
change has occurred, it is passed down to all male descendants.
What are ySNPs
and how do they differ from the ySTRs
we have been testing since 2000?
Short Tandem Repeat (STR)
Pronounced "stir." This is a repeating pattern of genetic code letters at a location on the genome. The value is the number of times that pattern is repeated at that location.
Single Nucleotide Polymorphism (SNP)
Pronounced "snip." A single and permanent change in the DNA bases at a given location.
Consumer DNA testing to discover family
history information began in 2000 with the focus on the Y chromosome (yDNA) which
only males possess. Mitochondrial DNA testing for both genders soon followed
but is somewhat limited because it has ONLY 16,569 locations to store a single
bit of information. By 2010 autosomal DNA testing burst on to the scene and has
become the most popular test.
By 2013 a new testing cycle for yDNA became available to genealogists. While the previous cycle had focused on
testing ySTRs, the new wave examines ySNPs.
However, yDNA can record 3,500 times the
data that mtDNA can. Therefore, it has the power to record a much more detailed
migratory history.
Most of yDNA testing to date has been
conducted on Short Tandem Repeats (ySTRs). When we talk about 12, 25, 37, 67
and 111 marker tests, we are referring to how many ySTRs were tested. STR
testing is analogous to dispatching a census taker to a village which is known
to have 12, 25… residences. In our scenario the locations of these residences have
been defined by geneticists as being accessible and having a rate of mutation
that is somewhat predictable. At each location our census taker records how
many STRs are currently in residence.
In NextGen testing the focus shifts to
Single Nucleotide Polymorphisms (SNPs). Instead of
dispatching probes to specific, predefined locations, NextGen ySNP testing is
more analogous to take satellite images along the entire Y-chromosome. Although
the chromosome contains almost sixty million identifiable locations, current
technology allows us to get reliable data from only about a fourth of those
locations. Still this is an overwhelming amount of data. The computing power to
analyze it has only recently become available.
At present ySNP chasing is only in its infancy. A vast
majority of the SNPs we know today have been discovered in the last two years. The
statistics in the chart below represent the number that had been placed on the
International Society of Genetic Genealogists (ISOGG) yTree committee chaired by
Alice Fairhurst:
Cumulative SNPs placed on the ISOGG yTree |
Another way to look at this SNP tsunami is to view the new
SNPs identified in a two year period (2013-2015) for R1b-L21, the most common
male haplogroup in Western Europe today:
Known SNPs in R-L21 haplogroup in mid-2013 (Mike Walsh) |
Known SNPs in R-L21 haplogroup in mid-2015 (Mike Walsh) |
We are still working to find the exact location and sequence
for many of them. In some ways our knowledge today would be like getting a SNP
passport with several dozen “check point” stamps on it but in random order. We
know that our genomes passed through all those points but are still trying to
decipher in what sequence that journey occurred. The charts above for R1b-L21
represent ySNPs that we have been able to arrange in evolutional order. As more
men are tested and we can document where they exited the main SNP trail, we can
refine our chronology for all of us.
The chart below for sub-clade R-1026 is an expansion of the seven pale pink SNPs clustered at the bottom of the chart above. This subclade was
unknown when the previous chart was drawn in 2013.
Courtesy of Alex Williamson -- www.ytree.net |
Even with this deluge there are many more thousands of SNPs to come. The
NextGen curve is where the ySTR was in 2003 when 10,000 tests had been sold by FTDNA. Ironically, that is the number of BIG Y tests Bennett Greenspan reports FTDNA has sold to date. Full Genomes report their company has sold 1,500 NextGen tests.
Most of the ySNPs that have been discovered have yet to be specifically placed and more will
be discovered as testing numbers increase. The entire recently discovered R-S1026 haplogroup above is not yet integrated into the ISOGG ytree. It is only partially integrated into the FTDNA ytree. The R-S1026 chart contains many blocks or boxes that group newly discovered SNPs. At this point we believe we have the blocks in the correct chronological order of their appearance. However, we have yet to sort the SNPs within boxes into their correct order of appearance. And more remain to be discovered. Other haplogroups are in a similar state of discovery and growth. The SNP tsunami shows no sign of receding anytime soon.
Nomenclature is a real problem. As we get further down the tree it is going to become impossible to deal with all these SNPs, it is bad enough at the top end of L21. The current random system using different testing companies all doing their own thing is just inadequate.
ReplyDeleteJoe,
ReplyDeleteBrian Swann uses the term "leaves" to describe that part of the branches of our trees that you describe as getting further down the tree. That's where as a genealogist I get the most excited. I'd like to time this chart into my paper pedigree.
Seven of us are below the square big box in the middle of Alex's chart for R-S1026 above. We have matched for years on the more advanced ySTR tests. Some of us are as much as 10 markers apart on 111 ySTR tests. We decided to do the BIG Y together to see what we could learn about our common journey. We marched out of prehistory together. Our paths began to diverge as we reached the time that surnames were beginning to come into common use.
Three of the seven of us are known to descend from my 6th great-grandfather -- each from a different one of his sons. The three of us share 5 additional SNPs below that big square box as we get closer to the present. We have a fellow traveler who may be an NPE (or perhaps we are the NPEs). We know that his family and ours were associated in the 17th and 18th centuries. The SNPs are telling us more about that association than we could tell by the STRs alone. But there is much more to learn.
"In some cases our genomes have recorded more than a hundred thousand years of travel." Truly they are, and this is interesting when you follow the view to think about yourself and your family.
ReplyDeletepmma polymer