By standard convention, the locations of DNA variations (SNPs) are based on their chromosomal position. This number changes every time a new reference human genome assembly is released; the current "build" is GRCh38, and the previous one, which is still used by many sources for a variety of reasons, is GRCh37 (also known as hg19).
Genes are read (transcribed) in either the forward or reverse direction, as numbered along the chromosome. If a new build comes along that flips a large segment of a chromosome, the gene direction will change. As a result, at different times, as well as in different publications or different databases, the same SNP can be defined as being on the forward (plus) or as being on the reverse (minus) strand. In terms of the nucleotides for that SNP, the pairing of A with T,and C with G, in the DNA double helix means that an A on the plus strand by definition is a T on the minus strand, and vica-versa, and a C on the plus strand means a G on the minus strand (and vica-versa). The way a SNP is defined is based on it's flanking sequences, so barring processing errors at dbSNP, the major and minor alleles of a SNP should not change between builds, however, the position number along the chromosome will change, and whether it is on the plus or minus strand may change.
SNPedia currently deals with these complexities using two fields, Orientation, and, StabilizedOrientation, which are usually seen on each SNP (rs#)page at the top of the sidebar on the right of the page. Orientation indicates the orientation reported in the most current build, which is reported below the genotypes in the Reference field. The chromosome number and nucleotide position in that reference build is shown next. StabilizedOrientation is the orientation that is relevant to the genotypes that have been defined in SNPedia for each SNP, allowing Promethease to minimize confusion despite how reference builds may change over time or how companies may report genotypes in a person's raw data. [Both fields are carefully controlled and you should not edit them; if you believe either is incorrect for a given entry, report it to us and we will investigate.]
Companies follow their own protocols, which are often different. 23andMe currently reports all genotype data based on the plus strand of GRCh37, whether or not dbSNP defined the SNP as being on the plus strand (in that build or any other build); they explain this for their customers here and here.
This often leads to confusion since a 23andMe customer may see a genotype in their raw data that will not match the genotypes defined by dbSNP, in SNPedia, or in their Promethease report. This will most often happen when the StabilizedOrientation is minus. In these cases, the alleles need to be "flipped" to match:
This can still lead to some ambiguous flips.
In a Promethease report, this is done automatically as Promethease correctly flips genotypes from each company's orientation into the orientation used by SNPedia.
discussion at reddit
Excerpt from a real email exchange
I noticed that there are errors in my report. SNP rs1056836 was flagged by Genetic Genie as abnormal (GG) came back as normal (CC) on your report. I confirmed that the GG is correct in my raw data on 23andMe. Also, rs651852 is TT in my raw data and AA on my Promethease report. This is just in searching 4 SNPs that came back abnormally. Do you have any insight on this?
- This is due to orientation issues. SNPedia and Promethease are more correct than any other source that you have seen.
- I'm really new to all of this, and I'm not quite understanding. Are you saying that the alleles reported don't matter in these cases and that the interpretation of "normal" by Promethease is more correct than Genetic Genie's interpretation as "abnormal"?
- The alleles matter, but you need to take orientation into account. Promethease goes to great lengths to do this; other sites don't.
What letters to write after the rs number
We trust dbSNP because they are the ones who assigned the rs#.
you can see that they call rs651852 as RefSNP Alleles: A/G (REV)
other sources such as 23andMe report it as C or T
This is a known issue with the notation we all use
SNPedia can't fix it, but we can try to make the positions prone to confusion obvious. This is why SNPedia and Promethease both show 'Orientation minus' quite prominently.
Perhaps it isn't yet obvious enough, but the general problem is that this often isn't made visible by other sources, and when it is available there is no consistency about terminology.
How to interpret the consequences of the genotype you have (regardless of how it is written)
for rs1056836 there are 3 possible genotypes. In SNPedia's orientation they can be written as
- (C;C) this is found in 33% of Cucasians
- (C;G) this is found in 43% of Caucasians
- (G;G) this is found in 23% of Caucasians
What is normal? Well the most common genotype is (C;G). The C allele is a bit more common than the G, so technically the (G;G) could be considered a rare, but it's found in 23% of people. Flagging it as something rare and bad seems unjustified. SNPedia has more than 50 papers which talk about this SNP, and there is no clear consensus of any significant effect.
As for rs651852, your genotype is found in ~15% of Caucasians. There are hardly any papers about this though, and no reliable conclusions can be drawn.
|... further results|