Saturday, February 9, 2019

"SNPS still hurt my head"



My cousin wrote "SNPS still hurt my head. Have not spent enough time with them." This is not my high school dropout cousin emailing. This is my PhD research chemist cousin who travels the globe to attend scientific conferences to share findings with fellow cancer researchers. If ySNPs still hurt your head too, you are in good company. 


What the heck are ySTRs?

This is a cousin who long since had tested his ySTRs to 111 markers. He and I match on 110 of 111 ySTR markers. My 5th great-grandfather, Peter Dowell, Sr. (1714-1802), is his 6th great-grandfather. Well maybe this is an opportunity to let him spend some more time wrapping his mind around ySNPs. Maybe, in so doing, I can begin to relieve some of his head pain. Maybe.

All of you who have used census records can relate to how data on ySTRs (short tandem repeats on the Y chromosome) are collected and compared. The locations visited for 12, 25, 37, 67 or 111 ySTR marker tests are not magical. They were picked originally because geneticists knew how to repeatedly and reliably locate them. Also they were thought to mutate just fast enough to be useful for genealogists. That means they are relatively stable for several generations so that men who share enough of them probably share a common patrilineal ancestor in genealogical times. At they same time they mutate frequently enough to differentiate between distinctly different lines of ancestors.

To use a simple example, let us assume that you know there is a village that has either 12, 25, 37, 67 or 111 distinct residences. You send a census taker to visit each residence and determine how many ySTRs reside at each location. In the lab this is what geneticists are doing when they collect data for ySTR comparisons. They sample the number of ySTRs "residing" on the Y chromosomes of men to give us an idea of whether or not those men are closely related. How many generations ago could a common patrilineal ancestor of those men have lived if the two men might be expected to have accumulated the number of variations that are observed in the current test results?


RECONSTRUCTING THE FOUNDER’S 111-MARKER DNA SIGNATURE
[from David R. Dowell, NextGen Genealogy: The DNA Connection, (2015), pp. 33-4.] 
Surname projects can apply these principles to reconstruct more complex descendant trees of family “founders.” In this case the “founder” was Philip Dowell, who appeared in the records of southern Maryland in the 1690s. By that time he was an established tobacco planter. His marriage in 1702 and death in 1733 are well documented, as are many other events in his adult life. However, no birth or christening records have yet been discovered. Although he is reported to have had four sons who passed his Y-chromosome DNA forward, living male Dowell descendants of only three of those sons have been identified and tested.
Five of those descendants have tested to 111 Y-STR markers. One was a descendant of Philip’s first son, one was a descendant of Philip’s second son, and three were descendants of Philip’s third son. The results of these tests were that all five agreed on 101 of the 111 markers. In addition, four of the five (including descendants of at least two of Philip’s sons) shared the same value on all 111 markers. In other words, none of the two mutations of the descendant of the eldest son, the six mutations of the second son, or the single mutations of two descendants of the third son occurred on the same marker. Philip can be assumed to have passed down the marker values that at least four of his five living descendants share because they were passed down through multiple lines that have no common ancestor more recent than him.

Triangulating 111 ySTR markers back to the founder of the Maryland Dowells                       


Since this chart was prepared another Dowell has tested who is a 111/111 match for the reconstructed yDNA signature for the Founder. However, we are still in the process of determining this man's exact line of descent.


What the heck are ySNPs?

While ySTR analysis has helped the Maryland Dowells understand much about their inter-relationships in North America, it has yet to connect them with specific ancestors across the pond. Could ySNP (Single Nucleotide Polymorphisms located along the y-chromosome) analysis help?

As noted above ySTR analysis is all about determining whether or not men share the same number of repeats at predetermined locations along their yDNA. In other words do they match. ySNP analysis can also be at that level as well. If a man is fortunate enough to have a supposed relative who has already been SNP tested, a test of individual ySNPs can be a cost effective method of way to validate a match and also a ySNP for that man. 

However, for the most part, ySNP analysis is a voyage of discovery along the approximately fifty-eight million locations of one's y-chromosome. We can now get reliable readings on almost one fourth of those locations. When testing for SNPs we do not target a certain number of predefined and well known locations. We travel down our chromosome and look for branching points. These are  points where closely related genomes permanently separate -- somewhat akin to taking an exit off the Interstate. Most of the group continue on unchanged on the main genetic highway but one branches off and passes this branching point "mutation" on to all his descendants. So far Family Tree DNA (FTDNA) has identified more than 408,000 such branching points (ySNPs) in the yDNA samples the company has tested. Additional SNPs are discovered with almost every BIG Y test conducted.  

These SNPs trace the paths our genomes have traveled through prehistory and down to the present. Now that more and more men have tested to this level it is beginning to be possible to see branching points in genealogical times when we had surnames and some paper records exist. Thanks to the work of Alex Williamson with his Big Tree of the major haplogroup found along the Atlantic Coast of Europe, we can begin compare SNP branching with the STR and paper trail documentation we have seen in previous decades. Comparing Alex's SNP tree with the one seen above, three branches emerge that help refine the genetic trails of Philip Dowell's three sons that had been put together from paper documents and STR data. 

For the purist this charting process reversed the chronological order of the sons. In this chart the descendant of eldest son is on the right and the descendants of the third son on the left. However, the main take away for today is that branching SNPs have been discovered that separate the lines of descent within the last three centuries. The eagle-eye readers will note that a fellow traveler of a different surname has joined the genetic migration. His family was associated by both location and business transactions in both Maryland and North Carolina. The SNP branching suggests the genomic link is with the second son although other evidence is more ambiguous. 

Is you head hurting even more now?



 

Monday, February 4, 2019

What's your longest Single Shared Segment?



What is the longest Single Shared Segment (SSS or should we call it S-cube) you match with a genetic cousin? By this i mean the DNA test report says you only share this one segment with the match. There are no genetic crumbs of additional and smaller matches that the lab can identify that you also have in common with that individual. Usually if you share a 30 cM or 40 cM or 50 cM segment with another relative, the two of you also will share additional smaller segments as well. However, that does not always seem to be the case. It is often difficult to find the actual familial relationship that connects to my tree such potential "cousins" with whom I share only one long matching segment.

I have been analyzing a new match who MyHeritage predicts to be my third to fifth cousin. We are reported to share 53.6 cM. This would place us well within the expected range for such cousins. However, this match occurs totally within a single segment. In addition, my daughter, son and two of my three grandsons also share almost all this triangulated match. My new match is represented by the rose colored bar on my chromosome 12 below. The additional bars just below her in the triangulation box represent my immediate family members.

  
MyHeritage defines triangulated segments: 
Triangulated segments are shared DNA segments that you (or a person whose DNA kit you manage) and all of the selected DNA Matches share with each other, and therefore likely all inherited from a common ancestor.
So far I have not been successful in connecting this new "cousin" to my family tree although I have done better with some shorter segments in this region of this chromosome as will be seen in the following example.

 
Valley Forge Segment


There are sub-sets of the above group that are identifiable among my matches. One of them in this part of my genome I call my Valley Forge Segment. It is about 31 cM in length. I gave it this name because I inherited it from one and/or the other of my 4th great-grandparents, John and Hannah (Pearson) Hoar who were both adolescents in that area of Pennsylvania when General Washington encamped there during the American Revolution. It is normally difficult to trace with certainty atDNA this far back. However, in this case I have been fortunate.


The couple who head the above chart are my 4th great-grandparents. The rose colored boxes represent their descendants who have been shown to triangulate on the same 31 cM of atDNA on my chromosome 12. The gold colored boxes represent other family members who have triangulated on major chunks of this 31 cM. My two grandsons who inherited this DNA are in the lower right of the chart. Since they are now 11 and 8, it is likely this 31 cM segment will survive intact in them and/or their descendants until at least the Tricentennial in 2076. Quite a family heritage. The above digression demonstrates that part of the original triangulated fifty plus cM match with my new cousin has been intact for well over two hundred years.


Back to the longest Single Shared Segment

I am beginning to try to connect some of the smaller triangulated matches that overlap my original Single Shared Segment (red bar below) with which I began this post.

The red bar in the top line of this chart represents my new cousin with whom I have a 53 cM overlap. The gold bar on the second line represents the 31 cM "Valley Forge" segment shown above. The yellow, green and light blue bars represent other recently discover cousins who MyHeritage reports as triangulating with the red bar. The yellow bar also has a small triangulation with the "Valley Forge" cluster. The darker blue bar represents a known paternal first cousin of mine whose account I manage. Her larger segment in this chart triangulates with both the red bar and the ''Valley Forge" bar but not the other bars.    
What seems to be emerging are at least two distinct subsets of triangulated groups both of which triangulate independently with my 53 cM Single Shared Segment. I have already identified how some of those within of the 31 cM Valley Forge Segment relate to me. At this time I have few clues as to how those in the other subset, to the left of the Valley Forge Segment (gold bar above). actually relate to me. I am most curious as to how I relate to my overarching 53 cM Single Shared Segment who would appear to unify these two subsets.


What's next?

I have just made contact with my 53 cM match and so far we have discovered no obvious connection but we will continue to look. It appears obvious that she is on my dad's side of the family but we have not narrowed it down much further. She is probably down the part of my paternal line somewhere near the relatives who make up my Valley Forge Segment cluster. But then?

As I was writing this post I realized I have several even longer Single Shared Segments (SSS). One is almost 75 cM in length. Is there something unusual about how these segments appear to survive as they pass down through the generations leaving no other genomic dust in their wake? Other individuals with whom I share chunks of 30 cM or more tend to share a number of additional segments -- some as many as 20 or 30. This seems to make them easier to fit into my known tree. Of course as the cM total climbs, so does the likelihood that the person is a closer relative.

Have you had experience, either good or frustrating with these SSS matches? Are you willing to share what you have learned? Dr. D. would love to hear from you.

I have decided to identify others with whom I have SSSs and separate them out for special attention. You can get in on the fun. It's easy if you have test results at MyHeritage but I'm sure there are workarounds at other sites.

When you are logged in to a MyHeritage account, select the DNA tab at the top of the screen. Then select the Chromosome Browser option from the drop-down menu. Your first 20 matches will be displayed and it will be easy to see how many segments you share with each and the length of your longest matching segment. By the second or third page of matches you should start seeing your longer Single Shared Segments

Have fun and don't forget to write about your discoveries!
   

 

Friday, January 11, 2019

is Gedmatch o.k.?



Dr. D got the following query earlier this week.

One of our Roots group members asked me today if GedMatch is an o.k. place to transfer his data.  He had gotten an email suggesting to him that he transfer, and he didn't know if it is safe.  I think mine has been transferred there, but I never get any emails from them telling me if I have matches.

I quickly responded that the simple answer was “Yes.” However, I realized that a more nuanced answer was required:
There is no universal answer that is “RIGHT” for all of us in all situations for all time.
 
Although GEDmatch has been well known to serious genetic genealogists for several years, the site exploded into the consciousness of a wide media audience following Barbara Rae-Venter‘s skillful use of this database. She was able to significantly “shrink the haystack” and help police focus on the needle who had eluded them for decades. In April, 2018, her efforts led to the arrest of Joseph James DeAngelo, Jr. He is alleged to be the “Golden State Killer” who is suspected of at least 12 murders, 45 rapes and more than 120 residential burglaries. He has yet to come to trial.



In the months that have followed, Rae-Venter has had similar successes with other cold cases and CeCe Moore has solved more than a dozen. GEDmatch has been instrumental in almost resolving all these cases that long had been considered unsolvable.



Why GEDmatch? It is not the largest of the databases of genomic information created for genealogists. Several of the DNA testing companies claim to have records of more individuals. However, it is different in several fundamental ways:

1.   Most of us who use GEDmatch were initially drawn there because we could match our DNA with known or unknown cousins who had tested at commercial labs other than the ones at which we had tested.

2.   GEDmatch is not a commercial for-profit enterprise. It does not advertise its services. Actually as its logo suggests, it might be more accurate to say GEDmatch provides tools for us to use ourselves rather than that it provides services to us.

3.   GEDmatch does not have a paid staff to provide all the services some other sites offer in terms of individualized customer service. For example it does not send notices when new matches show up. Users of the service must initiate searches to keep abreast of new matches. This makes the site useful to the genetic genealogists who take the initiative to use the myriad of tools provided.

4.   For most of its existence GEDmatch has been operated entirely by two “retired” men who are avid genealogists. Curtis Rogers and John Olson like to use their skills to help others unlock mysteries about their families.

5.   They originally charged no fees but eventually added small monthly fees for those who wanted to use advanced tools. This allowed Rogers and Olson to pay for the server time these features required.

6.   The basic level of services at GEDmatch is provided free.

7.   Although using GEDmatch does not require high level technology skills, an absolute novice may have difficulty gaining the traction needed to make the best use of features.

8.   GEDmatch does no testing on raw DNA samples. Instead it accepts testing data from commercial testing companies such as 23andMe, Ancestry, FTDNA, MyHeritage, and Living DNA.





3 reasons individuals test our DNA:

  1.  To discover information that may impact our health and/or that of our offspring.
  2.  To discover information about our ethnic origins. Market research has shown that this is the reason most millennials test. That is why television ads focus on what is probably the least settled of what our DNA can tell us. Although these first two processes may indirectly provide information about close family members, we do not need to directly compare our results with those of others to get useful information if these are our objectives.
  3. To discover information about connections with others. This is why adoptees and others of unknown parentage test. It is also the primary reason most genetic genealogists test. To be successful this activity must be a CONTACT sport. That is the objective!  

It is this latter group that GEDmatch is best suited to assist.



A tool for helping solve cold cases:


What has changed with the solutions of the cold cases? Many law enforcement officers have become aware of the power of genetic genealogy. Some of them have attended a seminar conducted by Rae-Venter on making familial matches. This is the same process through which adoptees have been searching for biological connections.



A recent article in Science reported that statistical simulations indicate more than half of Americans of European descent can probably be identified given the current 1.2 million names in this database.

Using genomic data of 1.28 million individuals tested with consumer genomics, we investigated the power of this technique. We project that about 60% of the searches for individuals of European descent will result in a third-cousin or closer match, which theoretically allows their identification using demographic identifiers.
Given the current growth rate of GEDmatch, it is projected 90% of those with European descent may be subject to identification within a couple of year.

Police have long had access to a hodgepodge of CODIS related data mostly derived from DNA testing of convicted or accused violent felons. Like our current prison populations, these databases are dis-proportionally constituted of minorities and men of lower socioeconomic groups. The current rash of cold case arrests have been successful because investigators were able to tap a very different demographic.


Community security vs. individual privacy 

Two fundamental rights are now in conflict. At the moment it appears that most members of the public are willing to tip this in favor of community security if we are talking about investigating violent crimes. In a US survey 91% were in favor of allowing police to search genealogical websites that match DNA to relatives in order to identify perpetrators of violent crimes (for example, rape, murder, arson, or kidnapping). Among respondents 12% had ordered a DNA test and 37% had researched family online. 

A parallel international survey of genealogists (41% from the US) reported by Maurice Gleeson asked, "Are you reasonably comfortable with law enforcement agencies using your DNA data on Gedmatch to help identify serial rapists and serial killers?" 85.1% responded "yes" and 8.6% said "no". When the undecided are filtered out the results of the two surveys are very similar.


Should you feel comfortable uploading your data to GEDmatch? Reasonable people can disagree. However, more than 90% seem to form a fairly solid consensus in favor.

What do you think?