Saturday, October 25, 2014

STiRring the SNP Pot

Most of us, who have been interested in genetic genealogy for more than 4 years, got our start by trying to understand the STRs (Short Tandem Repeats) of yDNA. For those of you who are new to this field, STRs are what is counted at various specified locations along Chromosome Y to generate the numbers on 12 markers, 25 markers, ... and 111 markers test results.

In the last few years our attention has been drawn to the cMs (centiMorgans) of matching segments of the large numbers of atDNA kits that have been tested by 23andMe, FTDNA and Ancestry.

Unfortunately, Full Mitochondrial Sequence test results have yet to reach the critical mass necessary to make mtDNA genealogically relevant to many of us. As the number of individuals tested continues to grow, this test will have genealogical relevance for more of us.

In 2014 the first wave of the SNP Tsunami engulfed us as results from Full Y, BIG Y and Chromo2, among others, began to come back in greatly increasing numbers. The mechanisms for organizing the newly discovered SNPs (pronounced "snips") could not begin to keep up. The FTDNA SNP tree currently lists my most recent SNP as R1b-L21 even though I had tested positive for DF13 (the next level down) in their lab in June, 2012. DF13 does not yet show up of FTDNA's SNP Tree even though several hundred customers have tested positive for it. Only SNPs known by November, 2013 and mostly those on the chip of National Geographic's Geno 2.0 test have so far been incorporated. As a result none of the SNPs discovered in the last year are listed in Y-DNA Haplotree currently posted on FTDNA's website

Even ScotlandsDNA, the lab which discovered and named my own subclade of S1026, has not figured out much about what it is:
Your S1026 subtype was recently discovered using Chromo2, so its distribution is not yet understood. You may carry markers that further define your subtype, but do not yet appear on our tree. You will find these in your genetic signature.
The ISOGG yTree is trying to keep up but is woefully behind where the SNPs are daily being identified as they wash ashore in the haplogroup discover projects. This tree which is relied on by academics and hobbyists alike to document the descent of "man" from yAdam to the present had identified and placed on its tree a total of 3,610 SNPs from 2006 to about this time last year. Since January of 2014 alone more than 10,000 additional SNPs have been added. And there is no end in sight. Tens of thousands more are in the process of being identified and placed by citizen scientists. 

NextGen sequencing has identified them. The harder job is to assign each of them to the correct haplogroup and to arrange them in the correct chronological order. Many more men need to be tested before this process can near completion.

Men wishing to learn more about their deep ancestry and those who wish to build bridges from their deep ancestry to their ancestral trail into genealogical time cannot rely on the yTrees of either FTDNA tree or ISOGG. These are too far behind the trail blazers. Instead the strategy that seems to be working is to seek out a man who has already taken a NextGen test AND who shows up as a match for them on a ySTR test. A match within 10 markers on a 111 marker ySTR test is likely to be a fellow member of one's subclade just beyond genealogical time. Mismatches of 7 or less on a 67 ySTR test are also good candidates.

SNP R1b-S1026 was discovered just below L21 and DF13 by ScotlandsDNA's Chromo2 service shortly before the BIG Y results started coming back.

It is at the very bottom and slightly right of center in the diagram above. At the moment it is represented by 4 pale rose colored boxes. This is my subclade. We are attempting to expand this group down into genealogical times and we are getting close. 

Recently two men who previously had not SNP tested were tested for newly discovered Z16891. For those of you who are trying to keep score, Z16891 is the rightmost pale rose SNP on the very bottom row in Mike Walsh's excellent chart above. These men were single SNP tested using the older Sanger technology at FTDNA. ySeq also offers the same test. These men chose to be tested for Z16891 because they had close STR matches with men who had taken part in the discovery of Z16891 as part of BIG Y. In both cases the men tested positive. These positive tests allowed these two fellow travelers to document their journey down the SNP flow from yAdam to SNP Z16891. 

This process provided a very inexpensive option compared to the first class ticket for the BIG Y. It is also less helpful to the discovery process. These men also did not get a list of SNPs below Z16891 that could turn out to be terminal SNPs that uniquely identify their specific families.

It is hoped that the testing panels now being developed will be another avenue for more men to get involved in SNP testing -- something between the vast BIG Y and the narrowly focused individual SNP tests. If those come on line over the next few months, we then will be looking at ways to test potentially terminal SNPs for individual families that may become the 21st century equivalent of 17th century coats of arms. 

Friday, October 24, 2014

NextGen Genealogy: The DNA Connection to Printer?

According to the schedule established by the publisher last summer, NextGen Genealogy: The DNA Connection should have gone to the publisher yesterday. Did it? Only time will tell. As some of you may have heard me say, I was born in Missouri, the "Show Me" state. Therefore, I only believe things after I have seen them. Since I am a slow learner, I often don't believe them until two weeks after I have seen them.

The official publication date for the book is November 30th, 2014. That hurdle should easily be met. The project was on schedule when I submitted the corrected page proofs and the index on September 25th. Since then it has been totally out of my hands.

If any of you are interested in getting copies, they are available through the publisher, Amazon and other outlets. 

This post is taking far longer than I had intended. As I went through the process of writing and creating links. I found many distractions. My original intent was to warn potential buyers of two possibly misleading bits of information in the advanced publicity. 

CeCe is NOT a co-author 

CeCe Moore was originally contracted to be a writing partner for this book. The book would have been better if she had been able to participate. However, her commitments to other projects did not allow her to meet our publication deadline. You occasionally will see her name associated with the book. I don't want you to buy the book under false pretenses. Once a book project is underway, it is like trying to get an aircraft carrier to change course to get all the databases corrected to reflect mid-project changes. The publisher originally had both my name and CeCe's on the cover. Amazon picked up that version. The publisher then amended the cover and Amazon updated the cover icon. However, Amazon continued to list her as a co-author and to include a short bio on the book page. 

I had been led to believe that only the publisher could submit changes to Amazon. This morning, after I started this post, I went though my author page at Author Central at Amazon. From there I was able to find a link to request a change in the product description, As soon as I submitted my request, my phone rang. After a couple of minutes of beautiful music, a human came on the line. After I explained the situation, she said the change would be made within three business days. It was made within three minutes.

You will find other venues which still list CeCe. My publisher's Fall catalog which is now live on the website is one example. She is actually good for sales; but I don't want buyers to be disappointed.

Electronic version

Unless you are buying the book for a public library, you probably would not be interested in the digital version of this book. My publisher is great in marketing books to libraries and college bookstores. The 21st century consumer market is not something that the company has mastered. To them digital version does not equate to Kindle. It means reading the book on the company server. Many public and academic libraries buy rights to have their patron be able to read books of this publisher online. In your case let the buyer beware.

Purchase options

The book is available for preorder. If you chose to order it through the publisher, you can use the following discount flyer. Shipping is probably extra. 

If you have Amazon Prime that includes free shipping, I'm not sure whether or not the above discount would make your total price less expensive.

Book length

The book is at least 30% longer that the advanced advertising would suggest. The 136 pages was a place holder that the publisher inserted before the first draft of the manuscript had been submitted. The index of the finished book begins on page 167. I hope you enjoy and learn from it. Let me learn from your feedback.

Friday, October 10, 2014

NextGen Sequencing and yDNA: Part 2

This post is a continuation of my post two days ago.

Within the last week several events have occurred to flesh out our small project. This is exciting but it also will take a while to absorb this influx of new data and make sense out of it all. However, relationships are emerging among project members -- some of whom had previously appeared to be living alone on almost deserted ySNP islands.

The results of an additional BIG Y kit has come back. This connected two men with at least several recent generations of documented French descent. Although they still may not have a common ancestor in genealogical times, their match appears to be within the last millennium. For one of these men whose father was adopted, this is encouragement that he is on the right track in pursuing some ySTR matches who are also of French ancestry.

One member has received Sanger confirmation through ySeq that his S1026 result from NextGen sequencing was correct. Although this analogy is very crude, NextGen sequencing is the equivalent to taking images from a space satellite. On the other hand, Sanger technology would be like driving to a specific location on earth and recording an image. 

NextGen sequencing is much faster for scanning large areas particularly those which may be almost inaccessible or those which have coordinates which were previously inexact or even unknown. It is great for discovery. 

On the other hand Sanger technology can be targeted precisely to one specific location (SNP or STR) and is considered to be much more reliable. The down side it is much more expensive drive around on the surface of our genomes and record a series of images that could be stitched together to form a coherent map. It is much faster and cost effective to start with satellite images. 

Two men who previously had close ySTR matches with others who had previous BIG Y results have tested a single downstream SNP through Sanger technology through FTDNA and confirmed they belong in this project. These men were able to target a specific SNP that had been identified by the BIG Y results of someone with whom STR results had previously suggested a distant relationship did exist. Thus at the cost of a single SNP test, these two men were able to confirm that their SNP trail takes time down into historical times and perhaps to the beginning of the genealogical era. 

One project member got this week, after a wait of three and a half months, his Chromo2 results from ScotlandsDNA. The early examination of the results confirmed that he did belong to R1b-S1026. It was in fact this test that identified and named a SNP at location 19201991 as being S1026. That is where the "S" came from in the naming protocol.

All of these results coming back within the same week has energized our tiny project which now only has a baker's dozen of confirmed members. However, it will take us a while to puzzle over what it all means and what our next steps should be to continue to trace our diverging trails down into genealogical time and hopefully connect with the documented genealogies of specific families.  

But now I must tear myself away from all this and fly to Houston today for Family Tree DNA's 10th Annual Conference on Genetic Genealogy. Don't you just hate it when your opportunities to learn more about genetic genealogy compete for your time to actually do genetic genealogy? I know, I know. I should just be grateful for my opportunities. And I am. 

Wednesday, October 8, 2014

NextGen Sequencing and yDNA

Genetic genealogy got its start in 2000 and yDNA dominated the first decade. mtDNA entered the scene late in that decade but has two difficulties to overcome. The first is that it is a fairly blunt instrument with only 16,569 locations to differentiate among all of us. It is good for deep ancestry but has yet to demonstrate it has potential to differentiate among related individuals. Second, to date there there have not been hundreds of thousands test their complete mitochondria -- the only level at which mtDNA seems to have much genealogical value.

By 2010 23andMe and FTDNA led the way into exploring the largest areas of our DNA -- the autosomes. These two pioneers were joined in this marketplace in 2012 by AncestryDNA. Now more than a million atDNA test kits have been sold by these three companies and the pace is accelerating. 

Autosomal DNA is great for defining close relationships -- at least when those relationships have existed within the last several few generations. Therefore it can be very useful to genealogists. However, since it is recombined in each intergenerational transfer, it soon loses its power of discernment as we investigate backward in time. This is the hottest growth area in DNA testing for genealogy and likely will continue to be so for some time. Women are on equal footing when it comes to testing autosomes.

In 2014 yDNA is making a comeback. It offers by far the longest segments of unrecombined DNA in our genomes. Therefore, it offers the best tool for looking into our deep ancestry. Although it may seem politically incorrect to say so, the less than seventeen thousand locations on our mtDNA cannot begin to be as informative as the more that fifty million locations on our yDNA. Unfortunately only men can be tested. NextGen sequencing technology is now making it possible to read SNPs at several million locations on our yDNA. This far exceeds the hundred or so ySTRs that were being sequenced by earlier technology just a couple of years ago. 

As a result of NextGen technology, tests like BIG Y, Full Y and Chromo2 have burst onto the scene. Although the prices of such tests are already coming down somewhat, they are still pricey compared to atDNA tests. However, the amount of data that they discover will take us a while to fully organize and analyze. 

Traditional genealogy emphasized starting with the present and building carefully and methodically back into the past inhabited by our ancestors. These new tests have allowed us to reverse our focus and work from prehistory down toward genealogical times. In a few cases they have already allowed us to intersect with our traditional documentary research. This trend will greatly accelerate as we get more skillful at interpreting the information written in our yDNA.

Even in earlier and simpler times we could begin to sketch the flow of our ySNPs from yADAM down toward the present. Five years ago I was offered an overview of how my SNPs and thus my paternal ancestors had migrated down to the last several thousand years. Below is how deCODEme illustrated my paternal descent down to haplogroup R1b -- the largest in Europe:

[Click on the chart to expand.]

The SNP tsunami that flows from these powerful new tests is allowing us to fill in gaps in charts like the one above. More importantly they are allowing us to build down toward the present. I will extend this SNP flow down to the last millennium in my next post.