Talk:MtDNA Position Conversions

From SNPedia

The 23andMe raw data download was apparently revised on 29th September 2011, and the count of mtDNA SNPs went down. These changes do not appear to be reflected in this chart. I have a version 2 chip. I previously had 2133 mtDNA SNPs in my raw data file. The count has now gone down to 2019. Dashdna 15:25, 23 October 2011 (UTC)

I will have to update this chart which is so old that it doesn't even have v3 SNPs yet. Jlick 06:33, 24 October 2011 (UTC)

It is now updated to the current selection of SNPs and v3 column was added. Jlick 09:11, 26 October 2011 (UTC)

Yoruba (YRI) - obscure and user unfriendly[edit]

This system is not mentioned or used by the vast majority of researchers and certainly not used by the National Center for Biological Information. If the intent was to keep customers from having useful, standard, and comparable data by 23andMe, they certainly chose wisely. John Lloyd Scharf 15:51, 24 October 2011 (UTC)

NCBI used the Yoruba standard prior to build 37 when they changed to Cambridge. Jlick 18:30, 24 October 2011 (UTC)

To my knowledge, they never used Yoruba. Build 37 is rather recent and not relevant to the revised Cambridge Reference Sequence. John Lloyd Scharf 05:01, 25 October 2011 (UTC)

Sorry, but you are wrong about them never using Yoruba. For an example look at which is the SNP for position 3666 in rCRS and 3667 in YRI. And sure enough, the dbSNP page shows the build 37.3 position as 3666 while build 36.3 lists position 3667. The reference sequence listed for build 36.3 is NC_001807.4 which includes the comment "this represents the mtDNA from an African (Yoruba) individual." On the other hand, for build 37.3 the reference sequence is NC_012920.1 which is the "revision of the Cambridge reference sequence for human mitochondrial DNA." Build 37 came out around early 2010 and most testing services still use build 36 as their reference. Until they update to build 37 universally, it's actually correct for them to use build 36/yoruba positioning, otherwise it will be confusing to have some results use build 36 positions while others use build 37. There are also plenty of resources for translating the positions such as 23andMe's raw data browser, my mthap program, and Ann Turner's mtDNA spreadsheet, as well as this wiki page. Jlick 06:36, 25 October 2011 (UTC)

You still have not produced any evidence of its use use by researchers or NCBI. The only reference to the YRI that I have seen by the NCBI is to the Yoruba in Nigeria - an ethnic group. They tend to be L3 haplogroup, which is not even what is called Mitochondrial Eve. Nice try, but no cigar. IT says, "This record has been curated by NCBI staff. The

           reference sequence was derived from AF347015.
           On Dec 27, 2001 this sequence version replaced gi:13959823.
           Please be aware that this represents the mtDNA from an African
           (Yoruba) individual.  The modified version of the original
           Cambridge Reference Sequence (GenBank Acc: J01415, Anderson et al
           (1981)) is NCBI reference standard NC_012920.
           COMPLETENESS: full length.

That means the original AF347015 was based on the CRS of 1981 rather than the most recent rCRS or REVISED Cambridge Reference Sequence. Have you had a full genome sequence done? Mine is:
Coding Region 750G, 1438G, 4769G, 6776C, 7148C, 8860G, 15326G, 15519C
Once you have it done, you can look up those polymorphisms in any research work.

John Lloyd Scharf 01:35, 26 October 2011 (UTC)

I'm not sure why this is so difficult to understand. Look at any mtDNA SNP in dbSNP on NCBI's web site and for build 36 everything uses the reference sequence NC_001807.4 aka AF347015 aka Yoruban Individual (YRI). 23andMe and most other testing companies use build 36 as their reference genome. This is documented in the header of every 23andMe raw data file. Therefore, mtDNA SNPs for build 36 data files must be relative to NC_001807.4. It's right there on NCBI's own website, so I'm not sure what other evidence you'd need about what reference NCBI used in build 36?

Nobody is disputing that this was an unusual choice of reference by dbSNP and that most researchers have long used rCRS as their standard reference. The fact remains that build 36 did use Yoruba as the reference sequence, and that anyone basing their technology on build 36 (which includes most microarrays) has to use that reference as well. It would be nice for everyone to catch up with build 37 which uses rCRS, but the fact is that science is very conservative about making abrupt changes, so it'll probably take a while for this to happen.

I'm not sure how you get the interpretation that "AF347015 was based on the CRS of 1981" from that comment. It is just a referral to the sequence that later replaced NC_001807.4 in build 37. The reason the 1981 reference is used is because the original J01415.1 sequence is from 1981 and was later revised in 1999 and updated in sequence J01415.2. The primary reference is always to the original work, and any revisions are noted as a secondary work in the references of the updated sequence.

In any case, as with all full sequences, the sequence is done as an independent work, not based on any particular reference. For example, your sequence JN020360.1 is the long string of letters at the end of and the "HVR1/HVR2/CR" numbers are a shorthand of comparing it to rCRS. Your sequence is not "based on rCRS" but your "HVR1/HVR2/CR" markers are.

You are right that AF347015 is in L3e2b1a, but nobody said that it had anything to do with Mitochondrial Eve, and if anything, rCRS is much farther from Eve than is Yoruba. It would make a lot more sense to use a reference sequence in L0 or L1 to be closest to Mitochondrial Eve, but it's way too late to change that now. Jlick 04:47, 26 October 2011 (UTC)

Document It[edit]

  • You still do not get it. You are confused. The change is not related to the builds but to a change in the CRS. The YRI is a Nigerian ethnic group designation for Sub-Saharan African the same as "CHD - 100 samples of Chinese in Metropolitan Denver, Colorado." It says this sample revision is based on GenBank Acc: J01415, Anderson et al from 1981. You seem to not understand English as it is written. It says at the bottom of the SNP you reference, "HapMap-YRI Sub-Saharan African 226," so this is one of 226 for which they have data. The L3e2b1a is based on differences from the Cambridge Reference Sequence. The mitochondrial DNA is a ring of DNA base pairs.

  • The point at which you start on that ring in the numbering system is arbitrary and in the midst of the high variability region.It could just as well have started where 16000 is now instead of where 000001 is now. You would have had one variable region that was large rather than two at opposite ends of the numbering system. The CRS was in place in 1981 using a woman's mtDNA from tissue that can be grown in a lab and, therefore, as a continuing standard. The reason they revised the CRS was because the original sequencing of that lab tissue was rerun and they found errors. That did not become the OFFICIAL reference sequence of the NBCI until 2009. That is the difference between the builds. If there was actually a separate human mitochondrial reference sequence in use, they would have it closer to the L0 or at least not divide the HVR, like the African (Uganda) Sequence D38112. SO, it is not just the differences from the rCRS, but the place at which the numbering system has its set point. Yes, that Yoruba sequence was mistakenly referred to as or used as the rCRS or revised Cambridge Reference Sequence, but it is not and never was the rCRS.

  • Before you respond again, feel free to cite at least one peer reviewed paper on this YRI sequence being used as the base reference, like there is for the CRI and rCRI, such as
  • Also, cite its use in a comparable Phylogenetic Tree, such as

If YRI was used as you claim, it would have been mentioned at least once in that article. See also Behar et al., 2008b,Macaulay et al., 2005, and Mishmar et al., 2003. Find SOMEWHERE this YRI is documented as a reference sequence rather than the CRS or the rCRS. There are several that were proposed AFTER rCRS that were not used in peer reviewed research such as:

Swedish Sequence X93334 This sequence has over 30 variant nucleotides from the rCRS.
African (Yoruba) Sequence AF347015 , formerly NC_001807.4. This sequence has over 40 variant nucleotides from the rCRS.
Japanese Sequence AB055387 This sequence has over 50 variant nucleotides from the rCRS.
African (Uganda) Sequence D38112 This sequence has over 90 variant nucleotides from the rCRS.

We could call mine the "VRS" or Virginia Reference Sequence because I have two unrelated full genome sequence matches to my mtDNA and the only connection seems to be the US State of Virginia in our genetic genealogy since the mid-1700s. We have no surnames in common documented.

  • Again, properly document your claims.

John Lloyd Scharf 14:30, 26 October 2011 (UTC)

I refer again to which includes in the "Integrated Maps" section the following line:

"reference 36.3 MT 3667 NC_001807.4 3667 + G + view blast"

This indicates that the reference sequence used for build 36.3 is NC_001807.4. I don't know what other information you desire and this will be my last response to you as this conversation seems not to be productive for either of us. Jlick 18:11, 26 October 2011 (UTC)

I was very clear in what was expected as a response and you failed to do that. You reiterated information which you do not know how to interpret and did not read what I wrote. It certainly is not productive for you because of your inability to interpret what it says. At no point does that document that it ever was used as the reference sequence. Nor can you find it used as such in research. EOL. John Lloyd Scharf 00:09, 28 October 2011 (UTC)