Showing posts with label SNps. Show all posts
Showing posts with label SNps. Show all posts

Wednesday, December 10, 2014

The Long Journey of your Genome: Part 1


Your genome had already been on a long journey before your parents got together to conceive the unique you. If the current estimates of our best scientists are to be believed, the human portion of that journey could have taken more than 300,000 years. For our genomes to have survived the extreme climate changes, wars, famines and disease is a miracle equal to those of the ones surrounding the creation of our species and our universe.

Most of our genealogies focus on identifying and chronicling the lives of those who were the carriers of our genomes during the last few hundred years of this incredible journey. That is likely only one tenth of one percent of the journey of our species down to who we are today. Documenting even this tiny part of the journey of our genomes can be a very formidable challenge.  
Success in this endeavor is more likely if we follow some principles that became well established in the 20th century. First and foremost among them I described in my Crash Course in Genealogy as:


Rule #1. Start what your know (yourself) and build back to what you don't know --- step-by-step. Don't skip steps!!
That's still a great rule that 21st century genealogists violate only at great peril. However, now the more intrepid of us can turn this rule on its head and attempt what in my just released NextGen Genealogy: The DNA Connection I call reverse genealogy. Basically this involves starting at the beginning with mtEve or yAdam and tracing our SNP flows down toward the present. We are able to do this because of two kinds of "celibate DNA" or DNA that is not recombined between the contribution of the mother and the contribution of the father when an embryo is conceived. As a result this DNA is passed relatively intact from generation to generation to generation. 

We have been able to trace our "umbilical lines" of descent for the last few years if we have tested all 16,569 locations along our mtDNA. Since all of us inherited mtDNA from our mothers, all of us can trace our "umbilical line" down to the present. Conceptually this was made easier in 2012 when Doron Behar and colleagues published a new approach to reporting mtDNA results that uses mtEve as a starting point and reports the mutations that occur as we fast forward down through the millennia to the present. 

Although 16,569 locations seem to be a large number, they do not provide the nuanced distinctions offered by the more than fifty million locations of our other celibate DNA -- our yDNA. Until recently most males were limited to looking at the number of Short Tandem Repeats (STRs) located at 111 distinct locations along our yDNA. Now it is possible to look for Single Nucleotide Polymorphisms (SNPs) along more that ten million locations with the BIG Y test. Other tests on the market now offer to test even more locations -- perhaps half again as many. 

These NextGen tests offer the possibility of tracing our ancestors' paths down from prehistory into genealogical times -- the era for which we can hope to find written records about our ancestors. I will discuss some of my early experiences with BIG Y test results in Part 2 of this discussion and how you can begin to investigate your own results if you have taken the test.

Saturday, October 25, 2014

STiRring the SNP Pot


Most of us, who have been interested in genetic genealogy for more than 4 years, got our start by trying to understand the STRs (Short Tandem Repeats) of yDNA. For those of you who are new to this field, STRs are what is counted at various specified locations along Chromosome Y to generate the numbers on 12 markers, 25 markers, ... and 111 markers test results.

In the last few years our attention has been drawn to the cMs (centiMorgans) of matching segments of the large numbers of atDNA kits that have been tested by 23andMe, FTDNA and Ancestry.

Unfortunately, Full Mitochondrial Sequence test results have yet to reach the critical mass necessary to make mtDNA genealogically relevant to many of us. As the number of individuals tested continues to grow, this test will have genealogical relevance for more of us.

In 2014 the first wave of the SNP Tsunami engulfed us as results from Full Y, BIG Y and Chromo2, among others, began to come back in greatly increasing numbers. The mechanisms for organizing the newly discovered SNPs (pronounced "snips") could not begin to keep up. The FTDNA SNP tree currently lists my most recent SNP as R1b-L21 even though I had tested positive for DF13 (the next level down) in their lab in June, 2012. DF13 does not yet show up of FTDNA's SNP Tree even though several hundred customers have tested positive for it. Only SNPs known by November, 2013 and mostly those on the chip of National Geographic's Geno 2.0 test have so far been incorporated. As a result none of the SNPs discovered in the last year are listed in Y-DNA Haplotree currently posted on FTDNA's website

Even ScotlandsDNA, the lab which discovered and named my own subclade of S1026, has not figured out much about what it is:
Your S1026 subtype was recently discovered using Chromo2, so its distribution is not yet understood. You may carry markers that further define your subtype, but do not yet appear on our tree. You will find these in your genetic signature.
The ISOGG yTree is trying to keep up but is woefully behind where the SNPs are daily being identified as they wash ashore in the haplogroup discover projects. This tree which is relied on by academics and hobbyists alike to document the descent of "man" from yAdam to the present had identified and placed on its tree a total of 3,610 SNPs from 2006 to about this time last year. Since January of 2014 alone more than 10,000 additional SNPs have been added. And there is no end in sight. Tens of thousands more are in the process of being identified and placed by citizen scientists. 

NextGen sequencing has identified them. The harder job is to assign each of them to the correct haplogroup and to arrange them in the correct chronological order. Many more men need to be tested before this process can near completion.

Men wishing to learn more about their deep ancestry and those who wish to build bridges from their deep ancestry to their ancestral trail into genealogical time cannot rely on the yTrees of either FTDNA tree or ISOGG. These are too far behind the trail blazers. Instead the strategy that seems to be working is to seek out a man who has already taken a NextGen test AND who shows up as a match for them on a ySTR test. A match within 10 markers on a 111 marker ySTR test is likely to be a fellow member of one's subclade just beyond genealogical time. Mismatches of 7 or less on a 67 ySTR test are also good candidates.

SNP R1b-S1026 was discovered just below L21 and DF13 by ScotlandsDNA's Chromo2 service shortly before the BIG Y results started coming back.



It is at the very bottom and slightly right of center in the diagram above. At the moment it is represented by 4 pale rose colored boxes. This is my subclade. We are attempting to expand this group down into genealogical times and we are getting close. 

Recently two men who previously had not SNP tested were tested for newly discovered Z16891. For those of you who are trying to keep score, Z16891 is the rightmost pale rose SNP on the very bottom row in Mike Walsh's excellent chart above. These men were single SNP tested using the older Sanger technology at FTDNA. ySeq also offers the same test. These men chose to be tested for Z16891 because they had close STR matches with men who had taken part in the discovery of Z16891 as part of BIG Y. In both cases the men tested positive. These positive tests allowed these two fellow travelers to document their journey down the SNP flow from yAdam to SNP Z16891. 

This process provided a very inexpensive option compared to the first class ticket for the BIG Y. It is also less helpful to the discovery process. These men also did not get a list of SNPs below Z16891 that could turn out to be terminal SNPs that uniquely identify their specific families.

It is hoped that the testing panels now being developed will be another avenue for more men to get involved in SNP testing -- something between the vast BIG Y and the narrowly focused individual SNP tests. If those come on line over the next few months, we then will be looking at ways to test potentially terminal SNPs for individual families that may become the 21st century equivalent of 17th century coats of arms. 



Sunday, August 10, 2014

Unraveling BIG Y Test Results: R-DF97


Yesterday I wrote about a BIG Y discovery that got my Maryland Dowells past SNPs R-L21 and R-DF13. The Virginia Group 1 Dowells in our surname project had long been known to have come forward in time from those two SNPs and were known to have reached R-M222. Thanks to the informative charts that citizen scientist Mike Walsh tirelessly updates at the site of the R L21 and Subclades Project, we have the opportunity to almost keep up with the current SNP tsunami:


The chart above is offered only to give overall perspective. The new subclade R-S1026 discussed in yesterday's post is represented by the four boxes colored pale pink in the middle of the chart. The more robust subclade R-DF49, which includes SNP M222, is the blue/aqua on the extreme lower left corner of the chart. Our Maryland Dowells and the Virginia Dowells have not shared a common male ancestor in about 3,500 years even though both followed the same SNP trail down from yDNA Adam to SNP DF13. You will need to visit the linked project website to be able to read the details of this chart. 

The DF49 corner of the chart is blown up below:



I request the reader's indulgence to ignore the gold and yellow boxes in the upper right corner of this part of the chart. The Virginia Group 1 Dowell who took the BIG Y test, was able to trace his SNP migration pattern several hundred years and eight SNPs closer to the present. He now is confirmed to be DF97 and beyond. DF97 is at the bottom of the third column from the left in the above chart.

Although this Dowell has been able to discover the trail of his ySNPs through a significant part of the last few millennia, he still has discoveries to make to connect his paper trail to his SNP trail. As was the case with my Maryland Dowells, The Big Tree of Alex Williamson gives many more recent SNPs to try to arrange in the proper chronological sequence. It is necessary to visit the original website to get a clear view of the SNPs that the yDNA of this Virginia Dowell has accumulated as his paternal clan moved toward the Atlantic coast of Europe. 


The Virginia Group 1 Dowell is the third column from the left in the above chart. SNP DF85 is in top row and DF97 is in row three. There are still many SNPs to arrange in the proper sequence in recent centuries as attempts are made to tie the SNP path into the documented path and to identify his nearest relatives.


Saturday, August 9, 2014

Unraveling BIG Y Test Results: R-S1026


For a long time I have been stymied in my efforts to trace my SNP trail through the most recent three or four millennia down to genealogical time. Now we are beginning to make some headway due largely to the herculean efforts of the citizen scientists of the R-L21 and subclades project. The BIG Y, Full Y, Chromo2 and other discovery tests are providing multiples of the numbers of SNPs that had been identified prior to the beginning of 2014. 

R-L21 is the most prevalent male haplogroup along the western coast of Europe. In some areas it approaches 80% of the male population. Therefore, knowing that one is part of this mega clan is interesting but not very useful genealogically speaking. I had tested positive for DF13 which is a SNP just below L21. This still is not that useful as the vast majority of R-L21 men also belong to this subdivision. A dozen subclans of DF13 have been discovered in recent years but one by one I had tested negative for all of them prior to getting my BIG Y results. Now I know that I belong to the newly identified S1026 subclan. Below are the results of nine of us who have BIG Y results: 
This chart lists the SNPs for each of us that have been discovered downstream (toward the present) from S1026. At least six men have been identified by the Chromo2 project at ScotlandsDNA. 

My results are those in the middle column above. The man whose results are my closest match in the SNP chart above (just to the right of mine) is a sixth cousin-once removed. He and I share 105 of the 111 short tandem repeats (STRs) over which we previously had been tested. We appear to share five SNPs that so far separate our migration trail from that of any of the other members of this emerging group. He and I share a common ancestor who died in Southern Maryland in 1733. Even more recently I have four additional SNPs and he has seven.

The McDaniel man represented by the SNP trail in the column to my left above is my next nearest relative in this grouping. He and I previously had discovered we shared 35 of 37, 64 of 67 and 102 of 111 STR markers. He has seven identified SNP mutations since his ancestral DNA trail separated from mine and that of my Dowell cousin. The three of us share nineteen additional so far identified SNPs in common before our common trail merges with that of the three men in the columns to our right. Then the six of us share five earlier SNPs before we converge with others with whom we share SNP R-S1026.

It is going to take test results from additional men to sort out the exact sequence in which all these SNPs should be arranged chronologically. For example, we know that the five SNPs recently named (see chart above):
Z16886 Z16887 Z16888 Z16889 Z16890
are grouped together but we don't know in what chronological sequence they occurred. Only as more are tested and some are positive and others are negative will this more precise arrangement be possible. This sorting of other SNPs which are lumped together above will follow a similar process. As a result the SNPs will appear to be out of sequence as their correct ages and thus their actual locations along the migration path of our paternal DNA begin to appear. This will result in the nice orderly naming progressions to be scrambled.

Isn't genetic genealogy fun? The more we discover the more we have yet to learn. 

Monday, May 19, 2014

Once Upon A Time: A SNP Fable


This SNP fable is not literally true in all regards. It is a "fictumentary" based on what we know but liberties are taken to fill in blanks where science has yet to provide more definitive answers. I hope that each time I tell it there will be less fabrication and more scientific fact. SNP discoveries are now being made so fast that such an expectation is not impossible.
SNPs are permanent changes in one location along the genomes of our ancestor that have been passed down to us. We can trace the accumulation of these SNPs, much as we could follow the paths of our ancestors backward in time as if they had left notches in tree trunks as they made their journey through time.

This journey can be traced back thousands of generations. However, in the interest of time, I will fast forward down to the last four millennium or so. This is the story of the journey of my own paternal line as I am discovering it with my results from the BIG Y test. 

As most of you have discovered, all families who share the same surname are not recently related. In my case we discovered a decade ago in early ySTR testing that the Dowells who flourished in Southern Maryland in the late 17th century were not biologically related to those who flourished in Central Virginia in the early 18th century. The surname came into use independently in more than one location. However, these two clans who were to become Dowells had traveled down the SNP highway from the beginning of time until they separated as they approached the Atlantic coast of Europe about four thousand years ago. They were both part of the great R1b migration out of Central and West Asia sometime after the last ice age receded. 

For those of you who know a little about SNPs, both of these two groups who became Dowells belonged to R-L21 which is the most prevalent haplogroup along the western coast of Europe. The timeline is still fuzzy but a few hundred years later they were both part of the SNP DF13 that was the major branch below L21. Here they came to a parting of the ways that we are just now beginning to be able to decipher with results from tests of discovery such as BIG Y, Chromo2, Full Genomes, etc. These tests are still not for the casual genetic genealogists or the timid of wallet, but they are where the fast and furious action is.

The trail of my own paternal line is being revealed to have branched off at SNP S1026. So far the Chromo2 project has discovered six individuals whose ancestors have passed this SNP down to them. Seven, including me, have been identified by the BIG Y test. And the number seems to grow weekly. 

So far my tale is more fact than fiction, but buckle your seat belts. The chart above is thought to describe the genetic journey of seven of us over the last 3,500 years or so. However, we don't know yet in what sequence each of us passed through these various SNP junctions. We will learn more about that as more members of this clan have test results.


The Fable

However, as of now it appears that each of us have approximately thirty or so SNPs spread out over a little more than three thousand years. That averages out to about one SNP junction every one hundred years. It appears at the moment that ancestors of the man whose path is second from the left never left France for the Isles. They stopped just short on the Brittany coast across from Cornwall. The ancestors of the rest of us appear to have made the plunge at some point in the last three thousand years or so. The ancestors of the man on the left seem to have made it to Ireland. 

The ancestral lines of the five of us on the right seem to have stayed together for another five hundred years or so. We all share 5 SNPs not shared by the two on the left. 

Have you heard the one about the three brothers? It looks as if something like that happened almost 2,500 years ago. One headed for Scotland. Well, you
have heard that one before. 

My own ancestral line [the middle one in the chart above] and that of another fellow traveler continued together for about seventeen hundred years or so. The two of us already had STR matches but no common paper trail for the last three hundred years. According to my fable version of our common family history, our closest common male ancestor might have been as far back as eight hundred years ago. TiP at FTDNA predicts our connection is a little closer:  
  
Generations  Percentage
8 11.54%
12 60.70%
16 88.49%
20 97.48%
24 99.55%

Oh well.


I look forward to the opportunity to learn more about the journey of my own accumulation of SNPs. If you can correct what I have written or add to it, I would love to hear from you. That is how I learn.

Friday, May 9, 2014

What's So Big About The BIG Y Test?


Some of you may be in a similar predicament to the one I have been in for half a century. I know who my 6th great-grandfather is on my surname line; but I don't know anything about where he came from. My ancestor, Philip Dowell showed up as an established tobacco planter in Southern Maryland in the 1690s. The rest of his life until his death in 1733 is pretty well documented. 

My first DNA test back in 2004 was supposed to help. It didn't. Well actually it did. It told me I was not related to most of the Dowells who were in Colonial Virginia. I wanted it to tell me who I was related to on the other side of the pond. On that question I have not really progressed much since 2004 or 1966 for that matter.

Over the years I have been able to make connections with many of living descendants -- including straight line male Dowell descendants of three of his four sons who lived to produce offspring. By testing our yDNA and triangulating the results we have been able to reconstruct what Philip's 111 yDNA STR marker test report would have said if he had been tested by FTDNA. However, we really are not much closer to tracing his origins prior to 1690. Over the last decade we have made contact with a few other non-Dowells who are within spitting distance of Philip's yDNA signature at 67 and 111 STR markers.

I was an easy recruit to take the BIG Y test when it was rolled out in November. My sixth cousin -- once removed, George Dowell and a more distant ySTR match, Herb McDaniel also decided to test.


yDNA Haplogroups

Many of you may know that R1b is the most common male haplogroup along the Atlantic coastline of Western Europe. One of its branches, L21, is very heavily represented in the British Isles. Many of those who have taken yDNA tests in the last decade have either been confirmed or at least projected to belong to the L21 group. 

For those of you who are not into STRs and SNPs, L21 is a SNP along the human migration path that represents permanent branching. To the best of our knowledge at the moment, the first male to have his "G" mutate to a "C" at location L21 on his yDNA did so about four millennia ago. All of his straight line male descendants have inherited this C. 

Now with the new SNP data flowing in from BIG Y and other expanded tests of SNPs along the yDNA, we are able to shrink the four millennia down somewhat. My goal is to find SNPs that have occurred in the last 3 to 5 centuries. This may help us connect our SNP paths with our STR data and with our traditional genealogical trees.


SNP Trail from R1b-L21 to S1026 and even more recently.

In the last month we have been closing that gap, but we still have a long way to go. My own most recent SNP is getting closer. Indications are that DF13 (also known by other designations shown in the box above) first appeared about 3,500 years. Now we have SNP S1026 to narrow the gap even further. After that come 5 SNPs in the center box in the chart above. They need a lot of analysis to place them in the proper sequence with each other and in the right historical era. Five of the seven of us who so far have tested positive for S1026 share those five SNPs which are just in the process of being named as I write this post. Then it looks like there are 18 additional SNPs even closer to the present that I share with Herb McDaniel. It will be interesting to see how the results of my cousin George and others help us fill in even more of our time gap. 

I am not advocating that  any of you rush and order the BIG Y or the Full Genomes Y Sequencing tests. These are still vehicles for discovery of SNP trails rather than for finding cousins. STR tests are still better for the latter. If  you want to SNP test, your dollars will be better spent if you build on what is being discovered by others. Generally your best strategy will be to find a near or even a  distant STR match who has taken one of these mega tests. First confirm the most recent SNP you appear to share by taking a single SNP test. Then consult with your haplogroup or surname project coordinators for suggestions for addition SNPs to test individually. Happy SNP chasing!

Wednesday, April 9, 2014

BIG Y: My First Genealogically Relevant Find.


The first thing I learned from the BIG Y test is that the Virginia Group 1 Dowells in our surname DNA project can finally be moved out of the logjam at SNP M222. What? You didn't know that they were jammed up there? Read on.

One of the first things we learned in 2004 in our surname project was that my Maryland Dowells were not recently related to the Virginia Group 1 Dowells. Previously we had assumed that we were closely related. We had our own variation of the multiple brothers myth. I'm sure you have heard a similar tale about one or more of the lines you have researched. 

It goes something like this. Two (or four) brothers came across the Atlantic. When they disembarked one went north and one went west. Which ever branch you descend from never heard from the other branch again. Of course there is enough truth in some such stories that they need to be investigated. However, most of them have remained impossible to verify. One of the Dowell versions I heard decades ago was that four brothers came over from Wales. I still don't know exactly where my Dowells came from before they revealed themselves in Maryland.

Prior to 2004 the working hypothesis among Dowell surname researchers was a variation of the migration myth that claimed upon arrival in Hampton Roads one Dowell turned right and sailed up the Chesapeake Bay and the other continued up the James River. Waterways were the interstate highways of the time so this story had a ring of truth. Based on this story many of us assumed that if either group would be able to extend its paper trail just one or two generations further back, we would find our common male Dowell ancestor.

Then came yDNA testing. It soon became apparent that the two groups of Dowells shared the surname only by historical coincidence. Biologically, we were no more related that we would be if we each had different surnames. Our closest shared male ancestor lived at least three thousand years ago -- long before surnames were adopted. These two groups remain the two biggest clusters in our project. 

SNPs (pronounced "snips") are permanent changes in a person's DNA that are passed down to all descendants. yDNA SNPs are permanent changes that are passed down by fathers to all their sons. As we have learned more about yDNA SNPs, we been able to sketch in more and more of our ancient ancestral lines. The BIG Y test has offered many of us an unprecedented chance to explore our SNP history in much more detail than had previously been available. This is not a test for novices. Even most of us who have considerable experience with genetic genealogy are overwhelmed by the results that are coming back.

Both groups of Dowells descend from a large haplogroup (ancient clan). Membership in this clan is distinguished by a mutation located at position called R-L21. The heat map below is from my results from the Geno 2.0 test at National Geographic which focuses on deep ancestry. The more intense the yellow and finally the red become, the larger percentage of the population carry this SNP. You will note that men who carry it are very prevalent along the Atlantic Coast of Europe and have particularly heavy concentrations in the British Isles.



The chart below shows what we thought we knew about where the two groups of Dowells had traveled down the SNP highway of history before BIG Y. The top of the chart has been truncated for simplicity. It begins as our ancestors migrated out of Central and Western Asia. You will note that L21 is represented by a green box in the upper middle of the chart below. We are very fortunate that a group of dedicated and knowledgeable citizen scientists also belong to the group and have done an immense amount of work to sort all this out. You may click on the chart to open a larger version in your browser. 
   

Before BIG Y we knew that the SNP flow of the Maryland Dowells had continued down to DF13 -- the green box just below L21 above. Then we could find no more recent SNPs. On the other hand the Virginia Group 1 Dowells could be traced through more recent SNP mutations down the left side of the chart to SNP M222. 

This chart was recently expanded to better represent newly discovered SNPs but still does not incorporate the bounty of BIG Y. Note that M222 is now shown among the blue boxes in the center right of the chart below:


The lower right part of this chart (area enclosed by the red rectangle) is blown up below for easier viewing:


Can you trace the path of SNPs from M222 in the fifth row of the family tree down to DF97 in the lower right corner of this last chart? It is sort of a connect-the-dots exercise for genetic genealogists. The Virginia Group 1 Dowells followed that genetic trail. That is what I have learned so far from BIG Y. 

How do I know that? The one Virginia Dowell who participated in BIG Y tested positive for SNPs DF85 and DF97. That means he also would be positive for the intervening SNPs along the connecting line from M222 down to DF97.

I hope we will be able to learn more from the massive amount of raw data that came back from this one test, but this is quite an advancement of our knowledge of the migration of the paternal ancestors of the Virginia Group 1 Dowells. Now we have to put it all into historical context -- a daunting task.

Wednesday, November 13, 2013

ISOGG Group Gears Up For SNP Tsunami


The International Society of Genetic Genealogists (ISOOG) is a totally voluntary organization that does not charge dues. However, since 2006 it has been responsible for maintaining the Y-DNA Haplogroup Tree 2013 for researchers and testing labs around the world. The number of SNPs being discovered has been exploding since the end of 2010 and this is just the beginning. The recent wave of newly discovered SNPs have resulted from the Walk the Y, GENO 2.0 and 1,000 Genomes projects as well as the normal discovery processes of investigation by academics and citizen scientists.

End of year
Cumulative # of SNPs in tree
2006
436
2008
790
2010
935
2012
2067
Sept, 2013
3610


The tsunami has yet to come. Geno 2.0 has not yet published all its SNPs. Treasure troves of additional SNPs from FullGenomes and FTDNA’s Big Y tests loom just over the horizon. These have the potential to identify and place thousands of here-to-fore unknown SNPs. Many of these will be leaves toward the ends of branches on the Y-DNA Haplogroup Tree. They will be recent enough to connect with the documented trees by genealogists. 

In anticipation of this bounty and the chaos that may accompany it, those members of the ISOOG group who maintain this tree who were able to gather in Houston on Saturday planned for this event.  

Alice Fairhurst (center) leads the discussion. Members of her group in attendance (clockwise from Alice) are Richard Kenyon, Marja Pirttivaara, Michael Herbert, Sue Berry, Dr. D. (in red), Tim Janzen, Astrid Krahn and Thomas Krahn. (Photo courtesy of Katherine Borges)
It is clear that our processes need to be reorganized and streamlined if we are going to be able to continue to serve the genetic genealogy community and researchers in related disciplines in a timely basis.