Have questions? Visit https://www.reddit.com/r/SNPedia

User talk:Cariaso

From SNPedia

AMPD1 deficiency[edit]

Re: Rs17602729 it certainly is a mess. It changed orientation between b36 and b37. My bot won't change orientation and genotypes automatically because I've seen some cases where what's in dbSNP is clearly wrong and I don't want to stomp on manual edits, but this is one case where it would have been correct to change it. All the ss's for the dbSNP entry are consistent with A/G plus or C/T minus. There's one ss230701024 which has ambiguous orientation but looking at the flanking sequences it's clear it's talking about rev/T A/G. In addition, everything in 1K genomes is A/G plus. OMIM only talks about C/T variants relative to the gene, so presumably minus orientation. My guess is that it's only C/T minus and the other crap is all confusion leftover from the orientation change and/or the ambiguous orientation on ss230701024. Besides the C>T would be a stop gain so presumably much more harmful than a non-synonymous substitution, especially practically at the start of the gene. I've updated the orientation, genos, and hapmap to match b37 and am confident that those are right. However, there's still a possibility that things will be called wrong in raw data due to the orientation change, so not sure what you want to do there. My first suggestion is to be conservative and call C or G normal, A or T pathogenic as either of those could potentially mean stop gain. Also OMIM says it is homozygous in cases of deficiency, so AA or TT should probably be the only high-magnitude genotypes. If you really want to commit, call minus C;C normal, C;T carrier and T;T affected and just delete the rest. I'm pretty sure that's the actual situation. --Jlick (talk) 19:30, 20 October 2012 (UTC)

Thanks for the advise on editing :)[edit]

Concerning my first edit, yesterday, i thought it was me being a total noob on wiki syntax :)

I was aware of the missing functionality, but its still rare enough for unknown contributors to add new genotypes that I figured it wasnt affect too many people. Your troubles got me to finally dig into the problem, and locate it precisely. The bug is in an upstream library, but I've now managed to get the developers aware of this https://bugzilla.wikimedia.org/show_bug.cgi?id=47150 .

And btw snptip still brokes SNP.

indeed it is. Unfortunately, my displeasure doesn't seem to be enough to motivate them to ship. Perhaps hearing others complain could help? https://twitter.com/cariaso/status/291584931841335298

I wonder if you plan to do a Promethease report of the recently released Neanderthal Genome pretty much as you did for the Denisovan speciment. It would be very much appreciated, at least from me :)

I'd expect to spend a few hundred dollars on Amazon in order to turn their BAM into a VCF. If anyone else wants to do that with these or similar steps, I'd be happy to generate and host the promethease summary.
--- cariaso 21:07, 12 April 2013 (UTC)

Spam page for deletion[edit]

Index.php Jlick (talk) 06:34, 3 June 2013 (UTC)

From 2011. Jlick blanked the page, but it still shows up in the list of articles. Thanks, Gene210 (talk) 20:36, 19 April 2013 (UTC)

SNPTips or something like it for Chrome?[edit]

I've written to the people who created SNPTips with my question and haven't heard back yet, but I'm wondering whether you happen to have inside info on 1) whether they'll be creating a version for Google Chrome and 2) whether you happen to know of some equivalent for Chrome one could use in the mean time? I searched around and haven't found anything yet, but I'm new to Chrome (though I love it! thus my question) and might be searching in the wrong place.

Thanks --User:Epsilon4 2013-05-15 12:40 (UTC)

SNPediaBot Bug[edit]

It's adding Rsnum Orientation fields to entries which already have Orientation:


    • WARNING** rs369586696: Rsnum has multiple Orientation fields plus.
    • WARNING** rs386134245: Rsnum has multiple Orientation fields minus.
    • WARNING** rs386134246: Rsnum has multiple Orientation fields minus.
    • WARNING** rs386834159: Rsnum has multiple Orientation fields minus.

-- Jlick (talk) 18:33, 27 August 2013 (UTC)

Noted. SNPediaBot is now stopped from any Orientation edits, until I can ensure I have a fresh page. Thanks. --- cariaso 19:14, 27 August 2013 (UTC)

gene_s? What is it, and why is it breaking things?[edit]

Rs2853493 is supposed to be a mitochondrial SNP (according to dbSNP) in the MT-ND4 gene. "gene" was set correctly to "MT-ND4", but "gene_s" is set to "METTL21D" (which isn't mitochondrial), which is wrong (I think) and shows up in the box as the Gene for this SNP. The chromosome shows up correctly as "MT", which doesn't match "METTL21D". Going to the MT-ND4 SNPedia page shows incorrectly that there are zero SNPs for this mitochondrial gene (and for most other mitochondrial genes). I think something is broken, but I can't fix it without knowing what gene_s is supposed to do, and why it is set to METTL21D. CarlKenner (talk) 08:43, 15 October 2013 (UTC)

gene_s is allows multiple values, which was not possible with Gene. It looks as though JLickBot may be incorrectly assigning onto the mitochondria, and that these incorrect values are hiding the correct ones. --- cariaso 16:12, 15 October 2013 (UTC)
Gene_s was added because some genes overlap each other, either on opposite strands or on the same strand. Also sometimes genes with multiple transcriptions are given separate names. Which is all to say that a SNP may be found within where multiple genes are mapped. Previously JlickBot made an effort to try to make a decision as to which gene in these cases was the most significant for a particular SNP, but that was sometimes a tough call to make. So to help this Gene_s was added which allows multiple genes to be listed for each SNP. The wiki will prefer Gene_s to Gene if both are present. Now the problem with MT is that SNPs there often map to multiple locations in the genome as there are mitochondrial pseudo-genes scattered in the autosomal genome. When getting the data for a SNP there are often multiple choices of gene given and it isn't always clear which is best. A while back I had gone through to semi-manually set all the MT SNPs to the correct Gene, and JlickBot will not change a Gene automatically. Unfortunately when Gene_s was added it went ahead and set these incorrectly in many cases. I've found the source for the incorrect METTL21D assignment and have disabled that source of gene data. I'm now going through to semi-manually fix the incorrect gene fields. Whenever you find a mistake on the part of a bot please go ahead and fix it yourself but also let the bot owner know so that the bot can be fixed as well (as you've done on my talk page). --- Jlick (talk) 23:05, 15 October 2013 (UTC)

mitochondrial haplogroups[edit]

Also, I notice mitochondrial haplogroups don't seem to be working or implemented. I can add some, but I'm not sure where gs numbers come from. Does SNPedia make them up, or are they official numbers like rs numbers? Are there set aside gs numbers that we should be using for mitochondrial haplogroups? CarlKenner (talk) 08:49, 15 October 2013 (UTC)

Promethease is able to support them, but we don't have genosets to cover most of them. We make the gs#s up, they do not come from any upstream authority. To make like easier there is a link at the bottom of Genoset which will assign the next unused number. Years ago, someone was interested in doing some Haplogroups, and started from a block at gs1000. This is harmless, and works fine, but you should assume that gs#s will eventually merge/split/be retired and that there may be some relevant genosets which are not part of that block, are not contiguous, etc. --- cariaso 16:15, 15 October 2013 (UTC)
Cool. For now I'm using the ones that were already defined, using the numbers that user specified in the Mitochondrial Haplogroup table. I did Haplogroup U, tested it in Promethease, and it is now working. (I edited the raw file to remove all the non-mitochondrial DNA to make Promethease run in an acceptable length of time and to test the mitochondrial stuff in isolation.) I think I'll do N and W next, since I can test those, and then the relevant subclades of those. But perhaps it would be smarter to make a bot that can edit SNPedia automatically by reading the PhyloTree website. CarlKenner (talk) 16:36, 15 October 2013 (UTC)

Template:Hgsnp was very broken. Nothing was showing up for ancestral haplogroup, derived haplogroup, ancestral allele, or derived allele. It's fixed now (finally). All haplogroup SNP pages should now show their data. Use Refresh on them if they don't. Stupid Semantic-MediaWiki. CarlKenner (talk) 17:53, 21 October 2013 (UTC)

agreed and appreciated. If you do get active with your bot, please review (and perhaps update) Bulk for emerging best practices. What language are you likely to work in? --- cariaso 18:22, 21 October 2013 (UTC)
Currently it is a Visual Studio 2010 project in C# using DotNetWikiBot. The code so far looks like this:
            Site site = new Site("http://bots.snpedia.com", "CarlKennerBot", "<password>");
            PageList pl = new PageList(site);
            pl.FillAllFromCategory("SNPs on chromosome MT");

            //Page p = new Page(site, "rs8896");
            //memo1.Text = p.text;
            foreach (Page p in pl) {
                memo1.Text += System.Environment.NewLine + "=========================" + System.Environment.NewLine + p.text;

and it is working fine (there are only 338 pages in that category). I haven't tried writing anything to the wiki yet. But there are methods for reading and writing template parameters, which should come in handy. The harder part will be reading data from pubmed/google scholar, phylotree, 23andme, and a couple of tables on SNPedia, to get the raw information that I need. But there are probably APIs for pubmed.
Anyway, what's your vision for the mitochondrial SNPs? Do you want all the mitochondrial SNPs in a normal 23andme v3 data file (about 2,459 rather than 338) to have pages on SNPedia (with their positions, alternative names, genes, whether it's synonymous or not, etc.)? Or just those that define haplogroups in phylotree or are mentioned in pubmed? Or just those with rs numbers? CarlKenner (talk) 19:16, 21 October 2013 (UTC)
[[1]] "Anything for which we can find something worthy of recording." ... "It would be possible to load all ~10M SNPs from dbSNP, but then the only thing we could say about 99.99% of them would be 'this is a SNP' and perhaps which microarrays it occurs on. Few people would care." So you need to be able to tell us something of value. That isn't a particularly high bar. I think anything which is part of the phylotree meets that. However, I don't feel everything from the 23andMe chip automatically qualifies. Get it working to your satisfaction for the phylotree ones. It looks to me like that might be ~11k snps? --- cariaso 19:29, 21 October 2013 (UTC)
my quick and dirty parsing of the phylotree suggests there are 3926 relevant snps, with these being the most commonly used
 45 G709A
 57 T16362C
 62 C150T
 68 T16189C
 82 T16311C
 87 T195C
 91 T146C
163 T152C

and that 2277 are mentioned only once, 733 are mentioned twice, ...

2277    1
 733    2
 324    3
 183    4
 111    5
  67    6
  46    7
  41    8
  33    9
  22   10
   9   11
   9   12
  10   13
   7   14
  1   45
  1   57
  1   62
  1   68
  1   82
  1   87
  1   91
  1  163
  1 3216

your testing should ensure that you can handle T152C in a way that seems sane.


Currently there are multiple entries in SNPedia for the exact same SNP. For example, position 73 on the Mitochondrial DNA, has a page at rs3087742 and a different page at i3001587. i3001587 was what the old version of 23andme used to call it, before it was added to dbSNP and given an rsnum and 23andme updated their file format. And of course 23andme did the sensible thing and used the letters from the plus strand, while dbSNP did their own crazy thing and used the letters from the minus strand... sort of (only the RefSNPs value uses the minus strand, everything else on the dbSNP page talks about the plus strand). I really think there should only be one page for each SNP, and since we are using dbSNP as the official reference (despite its low quality and stupid decisions), I think it should use the current rs number as the official name, and whatever crazy strand dbSNP is using. If there are 23andme names or obsolete/merged rsnums for that SNP, I think they should be automatic redirects to the official name.

Too many old files are out there, still with the old name. Add redirects from each genotype to the correct genotype of the flipped other name.

But I don't know how Promethease handles redirects and strand flipping. Will it work if I make i3001587 a redirect to rs3087742 (and transfer the information)? Do I need to make i3001587(G;G) redirect to rs3087742(C;C), etc?


Do I need a special kind of wiki operation to transfer the edit history of the old page or something? And what do we do about deCODEme SNP names that now have rsnums and articles? And if an SNP doesn't have a 23andme name or an rsnumber, but does have potentially interesting information in pubmed or phylotree, is there a naming system, or do we just wait for someone else to give it a name before we add it to SNPedia?

you can add some information under whatever name makes the most sense, but accept that it is of very limited value until a standard name is establish. We can help to get the standard name started. Note the submitter of dbsnp rs113993960 is SNPEDIA

23andMe and the FDA[edit]

23andMe has been banned from giving health information (and probably trait information) by the FDA. People who ordered after the 22nd of November get only ancestry and raw data. I've been directing 23andMe users to Promethease for the health information, which means we need to get our 23andMe SNPs in top shape ready for all the 23andMe users and able to give them the information that 23andMe are no longer allowed to give them. I want at least the 23andMe Health and Trait SNPs to be of similar quality to 23andMe and to have both the positive and negative SNPs with relevant text. And I want to increase the magnitude of the "boring" health and trait SNPs on the grounds that people are coming here specifically for them. I'm thinking something like Magnitude 1.9 for the 5 star 23andMe health SNPs currently considered "boring" and normal. I'm going to get started.

BTW, sorry I haven't finished my bot to update all the mitochondrial SNPs, because I became very busy with other tasks. I'm still working on it, and it's about half done and looking good so far. I'm going to put that off until after we've improved the 23andMe health and trait SNPs. CarlKenner (talk) 15:00, 6 December 2013 (UTC)

What does this actually mean?[edit]

What does "Each A allele at rs#### is associated with 4.6x higher risk..." mean? I assume that means there is a 21.16x risk if there are two A's? Or doesn't it work like that? CarlKenner (talk) 12:41, 15 December 2013 (UTC)

Thank you for everything you do here![edit]

Including catching my error on the HTR2A SNP. As it was my first edit here in SNPedia, I was carefully following the format on another page, and apparently I followed a bit too closely, erroneously copying the gene name as well as the format. Sorry about that.

I cannot begin to tell you how helpful your work here, on Promethease, and elsewhere has been to me and many others like me. Thank you!

--Up-a-Tree (talk) 04:24, 14 March 2014 (UTC)


Hello, Mike, regarding the Gs260, I've trace back the literature from [PMID 22065085] to [PMID 17952075] and [PMID 17952075]. In both paper cited by later publication, heterozygous at locus rs916977 [A/G] and rs1667394 [A/G] are likely to be associated with brown eye colour. In addition to that, it is biologically make sense since this trait is dominant so any heterozygousity will cancel out the blue color. What do you think about this? Adeuss (talk) 13:45, 24 September 2014 (UTC)

Thanks for digging. People like you make SNPedia possible. So it sounds like gs259 is correct, and gs260 just shouldn't exist? :I *think*
  • gs259 seems to be a pretty useless predictor. We've gotten way too many people who are homozygous at both and still have eye color which is far from blue. But it does accurately capture what the scientific literature says.
  • gs259 doesn't really say that you should have blue eye color. It says that you're a heterozygote for the haplotype. It just seems as if that is a pretty useless statement, or should be changed to say that "therefore some non-blue eye color is expected"
However this is an area I've not dug deep into. I welcome any or all edits, and trust that we can iterate towards clear consensus. Instead of me asking you what you think, I'll invite you to make any edits that seem right to you, and then I can review them and we can discuss.
--- cariaso 14:05, 24 September 2014 (UTC)

I agree to remove Gs260. However, I didn't find any template for page deletion. I've tried "delete|reason" and it doesn't work. It is not possible to just delete the page, right? I start to feel like a troll. ;D Adeuss (talk) 14:57, 24 September 2014 (UTC)

Please forget my previous post. It is just a typho for the template. Adeuss (talk) 15:02, 24 September 2014 (UTC)

deletion is only possible for me and a handful of other users. In this case I prefer to gradually phase the page out, because plenty of people have this genoset in previously generated promethease reports. by removing the criteria page it will stop appearing in new reports. I've also softened the main text. In a few months I'll fully kill it, but there is no reason to rush.

For Deletion[edit]

  • User:OrstoGB for spam
  • Ksr2 invalid casing (correct casing created; didn't think redirect was appropriate)
  • SERPIN1 not a SERPIN gene family member for homo sapiens

--Jlick (talk) 16:17, 21 February 2016 (UTC)

Page Edits[edit]

Some pages are auto-filled by bots when created, and some aren't. I guess I should only create pages that are automatically filled, or should I not? --Scgreen (talk) 05:54, 4 March 2016 (UTC)

If you have something to say, your edit is welcome. If the only thing you're doing is clicking on a redlink and saying 'save', there is no value. If we wanted that a bot would have done so. Technially the Gene pages have some value even when blank, but the should reflect the fact that a human was actually interested in that gene, not just that they saw it didn't exist. --- cariaso 06:12, 4 March 2016 (UTC)
To amplify a bit: Templates should mostly be left blank when creating pages as bots are used to generate them, which is usually more accurate. Some fields will not auto-populate if an incorrect value is manually added which can perpetuate incorrect information. (Bots assume that something set manually must have been done that way for a reason. A warning is generated in the logs when this happens, but there's not always time to investigate every warning from the bots.) In general, SNPs should only be created if there is some research that can be cited for it, and this should be linked to when creating the page. Genes can be created if there are any SNPs in the gene on SNPedia or if it is the subject of research, but it's helpful to say a little something about the gene function and/or its associations to traits, disease, drug efficacy, etc. -- Jlick (talk) 07:00, 4 March 2016 (UTC)
...and genotypes should be created only if there is a known association, which should be cited. Otherwise, e.g. Promethease reports will be filled with blank spaces. -- Jlick (talk) 07:10, 4 March 2016 (UTC)

Obsolete ClinVar Data[edit]

Please check Rs767222404. I've corrected the genos added by your bot which seem to be based on an obsolete ClinVar entry. The ClinVar data should be updated as well, but I'm not sure I'll do that correctly, so will leave that to you. -- Jlick (talk) 19:00, 7 April 2016 (UTC)