User:Zimmer
- https://www.statnews.com/feature/game-of-genomes/season-one/
- https://www.statnews.com/feature/game-of-genomes/season-two/
https://zimmerome.gersteinlab.org/2016/05/06/part01_gerstein/
When processing
http://archive.gersteinlab.org/proj/zimmerome/part05/data/variants/Z.variantCall.SNPs.vcf
This promethease report is produced.
Most of it is relatively typical, but promethease is trying to emphasize the atypical, so the top result is
rs79556279(G;T)
with the text
Behçet's disease HLA-B*51 the primary risk in Behçet's disease and rs79556279=T has the strongest association of any SNP in the HLA-B region.
That sounds alarming, in reality Promethease was indicating increased risk of Behçet's disease, not direct causation. I've tweaked the text in SNPedia so future reports will make that clearer.
Regardless of wording, what triggers this claim? The file at
http://archive.gersteinlab.org/proj/zimmerome/part05/data/variants/Z.variantCall.SNPs.vcf
contains this line
6 31329846 . G T 1169.77 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=2.521;ClippingRankSum=-0.027;DP=54;FS=3.854;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-0.615;POSITIVE_TRAIN_SITE;QD=21.66;ReadPosRankSum=0.686;VQSLOD=1.22;culprit=QD GT:AD:DP:GQ:PL 0/1:25,28:53:99:1198,0,1517
as well as
##fileformat=VCFv4.1
and
##reference=file:///gpfs/scratch/fas/gerstein/common/personalGenome/zimmerome/newAlign/hs37d5.fa
So hs37d5 is not one of the typical reference genome names I recognize, but presumably it's similar to the more familiar GRCh37.p13, and if so, then
https://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=79556279
tells us that rs79556279 is at
chr6:31329846
and
http://www.snpedia.com/index.php/Rs79556279
points out that
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4066492/
https://www.ncbi.nlm.nih.gov/pubmed/24876276
contains this sentence.
Conditioning on HLA-B*51 and rs79556279, the strongest associated SNPs, 4.5 kb upstream of HLA-B loci showed no other significant SNPs. Moreover HLA-B*51 and rs79556279 are in strong linkage disequilibrium. Two other regions, one tetrameric to HLA-C and the second in a region that includes HLA-A, were associated with BD, but the former was lost on conditioning for rs79556279.
https://www.snpedia.com/index.php/HLA-B
says
HLA-B51 is associated with Behçet's disease and other inflammatory joint and skin diseases.
Two associated tag SNPs for HLA-B51 are rs79556279 and rs116799036. [PMID 24821759]
https://www.ncbi.nlm.nih.gov/pubmed/24821759
says
- Among these was the most strongly BD-associated SNP in the study, rs79556279 [padditive = 2.2 × 10−50, OR 2.7 (95% confidence interval, CI, 2.3, 3.1)], which was located 4.9 kb 5′ of HLA-B. After controlling for the effect of rs79556279, we found that no other SNP in the HLA-B/MICA region was significantly associated with BD (Fig. 1A). Association testing of MHC-region SNPs conditioned on the effect of HLA-B*51 similarly identified no significant residual association in the HLA-B/MICA region (Fig. 1B), and, moreover, rs79556279 was in strong LD with HLA-B*51 [expectation–maximization r2 (r2EM) = 0.92; expectation-maximization pairwise linkage disequilibrium (D′EM) = 0.96], indicating that the effect of HLA-B*51 underlies the observed effect of rs79556279.
The condition is more common among Turks, Sephardic Jews, and people of Arab and Armenian ancestry.
He's also a carrier for rs28940579(C;T) which he wrote about as
- For example, I have rare SNPs in a gene called MEFV. At one location in that gene, the vast majority of people have a base called thymine. But one of my copies of the MEFV gene has a cytosine at that spot. This variant gives me the rare distinction of being a carrier for a disease called familial Mediterranean fever, which causes runaway inflammation. (You need two copies to actually get the disease.)
and both of these seem quite consistent with his apparent middle eastern jewish heritage.
He's also online at
https://www.openhumans.org/CarlZimmer/
and we can see the report for that at
and it's nearly identical. This time the top line is
##fileformat=VCFv4.2
so it's a newer VCF format. This one supports 'gVCF' which allows you to encode the normal (aka '0|0') calls via 'END=' tags. Promethease can use that to produce a much richer report, but sadly this file doesn't actually have any END= tags so the report is only for positions that vary from the reference.
The header also contains
##reference=file:///seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta
which reinforces the idea that while it may be the same sequencing, it's a very different assembly.
and the report from that data is at
This report is nearly identical, with the same top hits. There are a few extra genos in this openhumans file (17,222 vs 17,739) but for the differences seem minor.
Promethease is able to make a report for both files simultaneously, and now shows a 'conflicts' checkbox which allows us to highlight only the positions which disagree between the two reports. Here is that report about both
The only 6 positions with any information in SNPedia disagree between the two files
in the gene OXTR
- rs139832701(G;G) or rs139832701(G;T)
not in a gene
- rs11132733(C;T) or rs11132733(C;C)
- rs3119939(C;C) or rs3119939(C;T)
in HLA-A
- rs41549214(G;T) or rs41549214(T;T)
- rs12721717(C;GAA) or rs12721717(C;G)
- rs1655895(C;C) or rs1655895(C;T)
in MICB
- rs41293864(T;T) or rs41293864(C;T)
in GABRB3
- rs20317(G;G) or rs20317(C;G) in
A number of other variations influence metabolism.
- He's a CYP2C19 Ultra-Fast Metabolizer
- rs2228093(T;T) suggests a alcohol-induced hypersensitivity