Monday, July 6, 2015

A Brief History of ySNPs: Part 1

This is the first of a series of posts leading into how you can interpret NextGen ySNP results such as BIG Y and similar tests of discovery. These posts will not necessarily be written in chronological order. Sometimes when traveling through prehistory one loses an orientation to the currently accepted time continuum. I may retroactively renumber the parts of this series for those of you who prefer to have your information provided in a linear format.

We now estimate human ySNPs have been around about 338,000 years. This series is going to focus on only the last 4,000 years of one particular branch of the human ySNP tree. That is as far as I can now get my genetic periscope to come into at least a fuzzy focus. For reasons that are intuitive to some of you and chauvinistic to others, I am focusing on the journey of my own ySNPs. 


Many of you already know that a vast majority of our DNA consists of atDNA -- from within our autosomes. Our atDNA is incredibly helpful as we attempt to sort out family relationships within the last few generations. However, half of the atDNA information recorded in the cells of family is lost with each passing generation. We might say that atDNA information has a half-life of one generation. After five generations have passed, we can only interpret intelligently about three percent of the original information. This is why documenting the atDNA of the members of the oldest generation of your family should always be a high testing priority. Once they are gone, half of the family history information recorded in their autosomes is buried with them.


The information contained in our xDNA is similarly unreliable after a few generations and its quirky inheritance pattern adds additional twists to those who try to read it.

Celibate DNA

Those of us interested in using DNA to trace deep ancestry are left with celibate DNA. Celibate DNA is DNA that is passed down from generation to generation with the direct involvement of the parent of a single gender. It is not remixed with each inheritance event. Generation after generation it is copied with very few errors .


Mitochondrial DNA is a celibate DNA. It provides a window back into deep ancestry of our maternal lines. All of us carry a copy given to us by our mothers documenting the descent of our umbilical line back to mtEve. However, the story it can provide is limited to what can be recorded by using only one of 4 characters (i.e., A, C, G or T) at each of 16,569 locations. In other words it is a very short book. 


The Y-chromosome is the smallest of the 46 human chromosomes. Only men have one. However, with almost sixty million locations to record an A, C, G or T, its capacity to record its journey through time is vastly greater. It is a book that has about 3,600 pages for each page in the mitochondrial portion of our internal cellular libraries. As a result it can teach us much both about our human journey through prehistory and our families in the genealogical era.


Thursday, July 2, 2015

Coming out of Blogger Hibernation

This morning I taught the last class in my short course on Genetic Genealogy at the Osher Institute for Lifelong Learning at Vanderbilt University. This was the first time I had taught the course based on my recent book: NextGen Genealogy: The DNA Connection. I had plenty of material. The task was to organize it into coherent class sessions. So much had to be left out. Some of the student handouts are still online.

Hope the result was useful to the students. I know the experience was a learning experience for me as I was forced to look at genetic genealogy topics from a more comprehensive perspective. So much has happened since I finished the manuscript for my book a year ago.

Now I can turn my attention back to many small research projects. Some of them will result in blog posts in this space in the near future. 

I will be able to go to the watch the US Men's Soccer team play a friendly against Guatemala tomorrow here in Nashville with my son and two grandsons -- Noah who is almost 8 and Simon who just turned 5. This is the kind of activity that prompted Denise and I to give up our ocean view in California three years ago and relocate to Tennessee.

Look for me to have a much more active presence online later this month.

What are the rest of you doing with your summer summer?

Tuesday, May 19, 2015

Is English DNA different from German DNA?

A year ago I blogged about the paucity of identifiable German DNA in the current round of ethnicity estimates: Where Did All The German-Americans Come From? In that post I described the ethnicity reports of my wife who descends from 5 great-grandparents who have roots in what is today Northern Germany and Northwest Poland. The other 3 were from Ireland, Scotland and England. 

This past weekend I was struck with the similarity of the ethnicity estimates Debbie Kennett self reported and those predicted for my wife. The only problem is that all of Debbie's known ancestral lines appear to have been in the British Isles for at least the last few centuries and most of them have been in England. 

As I read Debbie's post, Comparing admixture results from AncestryDNA, 23andMe and Family Tree DNA, I speculated as to whether it would be obvious, to those shown the test reports for Debbie and those for my wife Denise, which woman had ancestry almost completely from the Isles and which had ancestry predominately from Germany/Prussia. I'll let you be the judge. 

Ethnicity Estimates Debbie Denise
Europe  100% 100%
Europe West 47% 15%
Great Britain  21% 33%
Ireland  20% 32%
Iberian Peninsula 8%
Europe East 12%
Scandanavia 8%
Trace 4%
myOrigins (FTDNA)
European 100.00% 98.00%
British Isles 57.00% 78.00%
Western & Central Europe 22.00%
Southern Europe 12.00%
Scandanavia  5.00% 3.00%
Finland & Northern Siberia 3.00% 6.00%
Eastern Europe 1.00% 11.00%
Asia Minor 2.00%
European 100.00% 99.80%
British & Irish 56.10% 42.60%
French & German 13.40% 5.90%
Scandanavian 4.20% 8.80%
Northern European 24.00% 36.80%
Eastern European 3.20%
Iberian 0.40%
Southern European 0.20% 0.30%
European 1.90% 2.20%
Yakut 0.10%
North African 0.10%

Monday, May 11, 2015

mtDNA doesn't change fast enough to be genealogically useful -- or does it?

I just love the scientific method of family history research. I can make an educated guess as to what a family relationship may be. Then I can formulate a hypothesis that is a reasonable explanation for that guess. Next I can set this hypothesis up as a figurative piñata. I then take a figurative stick and try to beat that hypothesis into submission using every genealogical tool I can lay my hands on. If I fail, I invite my family members and other genealogical colleagues to take whacks at my piñata hypothesis. As long as none of us can break my hypothesis, I assume it is correct and move on to formulate additional hypotheses to test. Of course sometimes there are subsequent developments that rattle my old undisproven hypotheses. That's when I have the opportunity to learn something new. Learning something new is one of my favorite things.

According to the conventional wisdom among genetic genealogists, mitochondrial (mtDNA) changes so slowly that it is not often genealogically useful in sorting out relationships. This was easy for me to believe when I went for years without getting an exact match.

Then in my views began to change. In two cases it appeared that mtDNA provide important guidance even though in neither instance are they likely to lead to exact matches with ancestral relatives. These cases have been previously reported in this blog. One related to my paternal grandmother's mtDNA and the other my maternal grandmother's mtDNA.

Those of us who know Judy Russell (AKA The Legal Genealogist) find her to be unusual -- if not unique. Therefore, I didn't pay the proper amount of attention when she posted about her mtDNA mismatches with a first cousin. Then I discovered mtDNA mismatches with a cousin that caused me to write Judy for confirmation. Here is her reply which I repost with her permission:
Yep, and it's a very close relationship: my own first cousin. This particular cousin is the younger of two daughters of my mother's youngest sister. Full sibling situation, no chance of any NPE of any kind, all the right autosomal indications.
And this maternal first cousin of mine and I show as a genetic distance of two.
 It turns out that I have a heteroplasmy in HVR1, and Paula has one in the coding region. FTDNA reports that as a genetic distance of two. Interestingly, her mother doesn't have either of those heteroplasmies, so -- as Paula puts it -- we're both mutants!
Wrote this up on the blog: Getting the drift.
In my case a Sherry, 3rd cousin, and I discovered that we were listed by FTDNA as mismatches of THREE. Are we both mutants? We are still trying to work that one out.

Sherry and I have each identified a first cousin to test to narrow our gap. In so doing we hope to be able to follow our mtDNA two generations back to our respective maternal grandmothers. This would be along the red lines shown connecting the individuals in the above diagram. After we get these test results back we will reassess and consider our next step(s).

In our case we are fortunate that Sherry and I are exact matches in the coding region -- the largest part of our mitochondria. This means we can test additional cousins for a relatively reasonable cost. We only need to order HVR1 & HVR2 tests to help resolve our mutant markers.

The HVR1 location at which we mismatch is 16519 Sherry's value is T and mine is C. The HVR2 location is 152 where I have a C and Sherry a T. I have taken the BIG Y test. Until recently the BAM files that were part of the output for the BIG Y had data on mtDNA. In the analysis of my BAM file conducted by yFull, I found the below report of the findings at two locations in question: 

Search in BAM file
ChrM position:
152 (+strand)
Position data:
Weight for C:
Probability of error:
0.0 (0<->1)
Sample allele:
RSRS allele:
rCRS allele:

Search in BAM file
ChrM position:
16519 (+strand)
Position data:
5T 16C
Weight for T:
Weight for C:
Probability of error:
0.320675177201 (0<->1)
Sample allele:
RSRS allele:
rCRS allele:

Of particular interest is that there were 21 reads at location 16519. The results were 5 "T"s and 16 "C"s. All 9 reads at location 152 found "C".  

To be continued as reports on cousins Jim and Dell come back from the lab in about a month. As they do will it cause me to have to re-examine my hypothesis about my one exact full mitochondrial match (see above)? Will I find that the actual mtDNA haplotype of my maternal 2nd great-grandmother really was not the same as mine? Stay tuned.

Monday, April 13, 2015

What’s your next writing assignment for Dr. D.?

This post is addressed primarily to those of you who have read my recent book NextGen Genealogy: The DNA Connection. If you bought the book from Ancestry, I would very much appreciate it if you would leave comments – however brief – on the Ancestry site. This will help potential readers decide if the book would be useful for them.

I agree with most of the comments that have been made there so far including the one about the book being overpriced. That is the result of the publishing process I used to produce the book. It strengthened the book by imposing a tried and proven structure to the process; but it gave the publisher control of the pricing. I’ll have to decide if self-publishing is a route I want to explore in future writing endeavors.

The only comments with which I disagree are those that I should not have used so many family examples or that I should have disguised my personal association from these vignettes. Other readers seemed to believe these illustrated and gave strength to the book. I agree with this latter group.

What comes next?
Over the next few months I will be considering what if anything I want to write in the near future. Originally, I had envisioned writing a trilogy: one book on genealogy research; one book on incorporating DNA results into family research; and one book on ethical issues surrounding DNA testing in both the family history arena and the medical arena. The first book became Crash Course in Genealogy (2011). The second became NextGen Genealogy: The DNA Connection (2015). At the moment I’m feeling less confident that I can add much to the overall ethical debate although this field is going to continue to heat up as more medical practitioners incorporate DNA testing into patient care. Maybe there is more I can contribute if I concentrate on extending what I have started with genealogy research.

As many of you have observed, books about DNA testing are partly obsolete before they hit the street. The field is evolving that quickly. Although my recent book has a 2015 copyright date, my ability to include recent developments began to contract many months earlier. Much of the content was being frozen in ink a year ago. The field of genetic genealogy is evolving from its core in many different directions and much of this process is occurring rapidly. Among the sciences only astronomy can rival the growth rate of genetics. For both fields the explosion of informatics has allowed the processing of the huge data sets needed to support this progress. This speed of change calls into question whether books can help readers keep up with the disparate knowledge that now radiate out on tangents in all directions from a basic core of knowledge that all of us need in common. Can an author keep up with enough of these to write a useful book?

Whether you bought the book from Ancestry, the publisher, another vendor or checked it out from your library, I’d appreciate your thoughts on a more focused topic. What was not covered in NextGen that you wish had been covered? What would you like to see covered in more detail?
I encourage you to write me with your suggestions. You may email me at infodoc [at] or comment at the end of this blog post. I will carefully consider your comments as I decide on my writing plans for the future.

One and done
During the television coverage of the recent US men’s college basketball spring rite known as “March Madness”, we frequently heard the phrase “one and done.” For the uninitiated that expression refers to the phenomenon of would be super stars leaving college after only one year to seek their fortunes as professional basketball players.

Am I seeing a parallel pattern among authors of books on how to do genetic genealogy? It seems that after one book is published authors choose to take their careers in other directions:

Smolenyak and Turner (2004);
Fitzgerald (2005);
Pomery (2007);
Kennett (2011);
Hill (2012);
Aulicio (2014);
Dowell (2015).

So far second editions and sequels have not been in vogue in genetic genealogy. Is there a message here for Dr. D.? Please let me know what you think.

Friday, April 3, 2015

Does Ancestry think we are NOT OK?

I find Ancestry's DNA Circles intriguing and really enjoy seeing a green leaf match on my DNA results page. However, I do object to Ancestry's patronizing attitude toward its customer base. "Trust me" is not an endearing phrase when uttered by a used car salesman. That kind of response is no more endearing when it comes from a lab you have paid to analyze your DNA. Yet "trust me" is exactly what we are asked to do when Ancestry announces we have a DNA match.

We are told that Ancestry has "Confidence Extremely High" that they have identified a 1st to 2nd cousin (see above) but we have to take that on faith. We cannot see the total number of cMs that match with this alleged relative or the length or location of our longest matching segment. Ancestry does not think customers who paid for the test need this information. It would be nice to see if others match us on exactly that same segment but Ancestry does not want to confuse us with that information. After all the company seems to say, wouldn't customer supplied pedigree charts be a better way to document matches with our relatives than would be precise chromosome locations? :-(

Why does it have to be either/or? Lots of us in the customer base would like to have both! It would simultaneously give us more value and give Ancestry more credibility.

For the last 15 years direct-to-customer (DTC) genetic testing for genealogical information has gradually been emerging from the chilling paternalistic concerns of the medical establishment about letting civilians have direct access to our own genetic data. This information is taken from with the cells in our own bodies. We have been making progress on a number of fronts -- including US Supreme Court decisions that corporations cannot patent natural genes. Now in the last three years Ancestry is trying to exert a paternalism of its own over access to our personal genetic information. 

Many of you remember the 1969 bestseller I'm OK You're OK.

It was based on the Transaction Analysis model of Eric Berne. You may remember that we were trained to analyze communication transactions using diagrams similar to the one below. It seems clear to me that Ancestry sees transactions with customers as them (the Parent) giving us (the Child) the information they think we need and are able to handle without us asking too many hard questions that might overload tech support about it.

If this analysis of our communication pattern with AncestryDNA is correct, Ancestry sees it as Ancestry is OK but we the customers are NOT OK. This seems to be in direct contrast to Anne Wojcicki's stated goal for 23andMe to empower us by providing us with access to our own genetic information and thereby change the face of health care. 

As rumors are beginning to fly that Ancestry is exploring the possibility of providing health related information from our DNA, I wonder what the business model would be for such an endeavor.

Genetic Genealogy Reaches New Milestone

In June genetic genealogy will pass a new milestone in its growth and development in the US. Two simultaneous conferences will occur -- one on each coast. It is notable that high quality speakers can be provided at both. I anticipate that both will attract large audiences.

The first will be DNA Day co-sponsored by the Southern California Genealogical Society and the International Society of Genetic Genealogists. This Thursday DNA Day is now in its third year as a lead in to the annual three day Jamboree. The 2013 and 2014 events were excellent.
For those of you who cannot attend either of these events live, the Burbank event offers "24 hours of exceptional DNA education without leaving home." The link in the previous sentence leads to a detailed list of presentations. For those who will be in Burbank, learning opportunities continue on Friday morning. At least one of the Friday workshops is already Sold Out, but there should be plenty of room at the consultation tables.

I'm giving two presentations on Thursday, will be at a consultation table on Friday morning and on an "Ask the Experts Panel" late Saturday afternoon. I hope to see many of you there.

On Saturday, June 6th, those of you on the East Coast will have the opportunity to attend a slightly different event. It is being billed by its organizers as:
The biggest, most extraordinary and most inclusive family reunion in history!

About the Global Family Reunion

What: A day-long festival of music, food, comedy, speeches and contests celebrating the fact that humans are one big family.
When: June 6, 2015
Where: The New York Hall of Science, on the grounds of the World’s Fair.
The Agenda: It will be a Family Reunion meets a World’s Fair meets a music festival meets a TED conference. There will be talks by celebrities, scientists and comedians! Music! Food! Exhibits! Contest! Games for all ages!
The Cause: Alzheimer’s. All proceeds go directly to charity.
Who’s Invited: You! All seven billion members of the human family. Those with a proven connection will get a bracelet and be part of the biggest family photo in history. See for more details.
Entertainers and speakers: Henry Louis Gates Jr., comedian Nick Kroll, Lisa Loeb, Sister Sledge, Daniel Radcliffe (live or via video), filmmaker Morgan Spurlock, author A.J. Jacobs, Dr. Oz and comedian Michael Ian Black, and many, many, MANY MORE!.
Activities: Scavenger hunts, crafts, family trivia quiz by Ken Jennings, genealogy, storytelling, historical interpreters and over 450 exhibits from the New York Hall of Science, one of the top 10 science museums in the United States.
The Host: Conceived by bestselling author A.J. Jacobs
This event is cosponsored by FTDNA and Bennett Greenspan is a speaker.

That these events can be held on the same weekend and both have every indication they will be successful is a major milestone for genetic genealogy in the US. We need to take a moment and congratulate ourselves and then get back to work. We need to sell a lot of test kits at both events to build up our databases.

Thursday, January 29, 2015

When a Surrogate is not YOUR Surrogate

For more than a decade we have been using surrogates to help us explore our family histories. We use surrogates to discover DNA information passed down from our ancestors that was not passed down to us. The first major application of this technique was when women solicited close male relatives -- fathers, brothers, nephews or cousins to take yDNA tests to establish the DNA signatures their paternal surname lines.

More recently we have become more creative in the use of surrogates. Almost four years ago I first blogged about using a female first cousin as a surrogate to help me discover information about my paternal grandmother's mtDNA. By so doing I discovered the ethnicity of the female ancestral line of my sixth-great grandmother.

Since then I have used surrogates in other ways.

  • I have tested a male first cousin -- once removed to discover the haplogroup of my maternal grandfather.
  • I have tested a male second cousin to discover the haplogroup of one of my maternal great-grandfathers.
  • I have tested a male third cousin -- once removed to verify the paper trail of an eighth great-grandfather.  

In these each of these instances I was making assumptions that later turned out to be correct. Each of these surrogates were actually related to me in the way I thought they were. By assuming these relations were correct, I was skating on thin ice. I was also violating Dr. D's Rule #1: 

Rule 1. Start with what you know (yourself) and build back to what you don’t know—step-by-step. Don’t skip steps!!

That means what you really know from your own experience. It does not mean things you have heard about as they passed down through the family second or third hand. 
Crash Course in Genealogy (2011), pp. 15-16. 

Continued violation of this rule will jump up and bite you sooner or later as I discovered in the last month. 

Back in 2007 I had helped a female extended family member select a male first cousin -- once removed to test as a surrogate to try to establish where her ancestors might have lived before immigrating to the US in the late 19th century. That original 37 marker yDNA test was followed by a Deep Clade-R test in 2009, and three single SNP tests -- one each in 2010, 2011 and 2012 as we attempted to narrow down the genetic migration trail. Last month during the FTDNA Holiday Sale, a decision was made to bite the bullet and order a BIG Y test for this surrogate to further clarify his haplogroup. As sort of an afterthought a Family Finder test was also ordered. 

The whole house of cards suddenly collapsed. The surrogate did not match his supposed first cousin -- once removed. He also did not match her sister or brother who also were supposed to be first cousins -- once removed of the surrogate. Seven tests and the seventh one under cut usefulness of the other six. 

Lesson to be learned: the FIRST test you should invest in when using a surrogate should be an atDNA test like Family Finder to verify that your supposed close relative is really YOUR close biological relative.

Tuesday, January 27, 2015

Dr D & Bernice Bennett to talk Genetic Genealogy


Please join Dr D and Bernice Bennett on Thursday night, January 29th, at 9:00 PM Eastern time as we talk about genetic genealogy on blog talk radio over the internet. Bernice is the host of the show "Research at the National Archives and Beyond". Here is what she has to say about her show:

Welcome to Research at the National Archives and Beyond! This show will provide individuals interested in genealogy and history an opportunity to listen, learn and take action. You can join me every Thursday at 9 pm Eastern, 8 pm Central, 7pm Mountain and 6 pm Pacific where I will have a wonderful line up of experts who will share resources, stories and answer your burning genealogy questions. All of my guests share a deep passion and knowledge of genealogy and history. My goal is to reach individuals who are thinking about tracing their family roots; beginners who have already started and others who believe that continuous learning is the key to finding answers. "Remember, your ancestors left footprints".
This week's show focuses on genetic genealogy:
What do you know about DNA?  Have you had your DNA tested and still have questions about your results?

Join David Dowell for a discussion about DNA and his new book NextGen Genealogy: The DNA Connection.


David Dowell was an academic librarian for 35 years. He has 2 degrees in history and 2 in library science. He has researched family histories since the 1960s. He is an ethicist, lecturer and author whose two most recent books are Crash Course in Genealogy (2011) and NextGen Genealogy: The DNA Connection (2014). He formerly taught “Genealogy Research” and “Ethics in the Information Age” at Cuesta College and chaired the Genealogy Committee and the Committee on Professional Ethics of the American Library Association. He blogs on genealogical topics as “Dr. D Digs Up Ancestors” at He coordinates two surname and one haplogroup DNA research projects. Dr. Dowell has taught library science courses face-to-face and online for 15 years and made presentations to local, regional and national library groups. He has taught genealogy research classes in both California and Tennessee and made presentations on genetic genealogy to community groups and local genealogy societies in California, Illinois and Tennessee. He is currently lecturing on genealogy research for the Osher Lifelong Learning Institute at Vanderbilt University.
Chat and call-in questions and comments will be accepted from the audience. The show will be available for streaming for those unable to listen to it live.


Did AncestryDNA quietly become more expensive?

Many of us die hard genetic genealogists who are seriously addicted to family history research may not have noticed, but AncestryDNA seems to have become more expensive for the casual DNA test taker. In a notice last updated on January 12th in the Help section of its site, Ancestry differentiated what is available to those who order an autosomal DNA (atDNA) test and those who order an atDNA test AND a database subscription. 

For those of us who regularly research family history, we subscribe to for the billions of records in historical databases. If we throw in an atDNA test; we get the full matching information at no extra charge except for the modest, one-time, cost of the test (currently $99). However, the current pricing structure as described above, makes one wonder if the price of the test is considered to be a loss leader to sell subscriptions to databases. If so it is understandable why Ancestry often offers the test at flash sale prices of $89, $69 and even $49.

Many of you will remember that Ancestry announced last summer that it was no longer testing yDNA and mtDNA. They really had not been active in this marketplace for some time when this announcement was made. 

What then do persons get if they do not also subscribe to the databases? Ancestry says:

An AncestryDNA test without an Ancestry subscription includes:

§     One of the most technologically advanced autosomal DNA tests available, that looks at over 700,000 markers across your entire genome.
  §     You’ll have access to your personal online DNA results, on at all times.
  §     Your DNA results include your full genetic ethnicity breakdown. So for instance, you can quickly discover if you’re part Scandinavian, North African, European Jewish—AncestryDNA reports on 26 different regions from around the world.
  §     Receive updates to your ethnicity over time as we roll out new findings.
  §     Your DNA results also include a dynamic list of DNA member matches to help you find potential new relatives. This is continually updated and includes everything from immediate and close family to 4th-5th cousins.
  §     Manage multiple AncestryDNA tests in one account.
  §     Keep your DNA results stored securely with your family history research on, all in one place.
The first four items above have to do with ethnicity testing. This would seem to be the main benefit for someone who tests but does not want to subscribe to database access. This is a prime motivation for many people to test. It swells the size of the database of tested individuals and provides more matches for all of us. 

For those of us who consider ourselves to be serious genetic genealogists, the ethnicity results are the softest part of the "science" of DNA testing for family history. The accuracy of the DNA testing is not in question. However, our knowledge of the GPS locations of specific populations 500 to a 1,000 or more years ago is far from settled science. 

Hard core genetic genealogists are after the matching relatives. We are also interested in the details of how and where theses matches occur. We have been frustrated by Ancestry's unwillingness to provide such details since it got into atDNA testing almost three years ago. It does not look like relief is on the way. 

It is unclear to Dr. D what exactly customers are being offered in the 5th bulleted item above:
 §     Your DNA results also include a dynamic list of DNA member matches to help you find potential new relatives. This is continually updated and includes everything from immediate and close family to 4th-5th cousins.
It appears that the list of close matches will be updated and continue to be available even if one does not opt to subscribe to Ancestry's database. What is not clear is whether such individuals will be able to see the pedigree charts of those matches. Without the pedigree charts, such matches are essentially useless genealogically speaking.

I hope someone from Ancestry will be able to clear this up for us.