Have questions? Visit https://www.reddit.com/r/SNPedia

User talk:CarlKenner

From SNPedia

A magazine reporter working on an upcoming article would like to speak with one or more dedicated SNPedia contributors. We appreciate your involvement and would like to personally invite you to consider speaking to this writer. If you might be willing, or have questions about this, let us know by email (info@snpedia.com) – thanks. Greg (talk) 03:32, 21 August 2014 (UTC)Greg


Great edits! Thanks and welcome to the site. --- cariaso 18:37, 9 October 2013 (UTC)


re: "magnitudes of most specific (most informative) haplogroup should be highest". Perhaps this is your intention, but I'd like to suggest that if the tree was

X->X2->X2a->X2a1

And we knew that

  • everyone in X2 was a descendant of Alice the Awesome
  • X2a1 is more common some region

There is an argument that X2 should have a higher magnitude than X2a1, even though X2a1 is more specific. In some previous cases I've also considered packing all of the genosets which just distinguish haplogroups into the range 2.0001 ... 2.0002. But generally, I think I agree with the idea that more specific is more interesting. --- cariaso 22:42, 13 October 2013 (UTC)

I was thinking that X2a1 would say something like "Descendents of Alice the Awesome that split off and moved to some region", making the X2 section redundant if you know X2a1. The downside is that if something new and interesting is discovered about Alice the Awesome, we might have to edit dozens of pages. I don't know. But it seemed you were already telling people to disregard all but the most specific, so I figured you'd want it at the top. CarlKenner (talk) 18:10, 14 October 2013 (UTC)
Precisely correct. if X2a1 also mentions Alice, then I think the ordering is most specific = highest magnitude. But assuming a boring X2a2, X2a3, X2a4 exists, it would be difficult to keep them all up to date, and having a single copy of the information is possible. I can imagine adding features to Promethease which communiates the graph structure, which might help to make this less difficult. For the moment we don't have them.


By the way, how do we decide if something is "bad"? If A is objectively worse than B which is objectively worse than C, how do we decide if it should be 1: "A is bad, B is neutral, C is good" or 2: "A is neutral, B is good, C is good" or 3: "A is bad, B is bad, C is neutral" or 4: "A is bad, B is bad, C is good" or 5: "A is bad, B is good, C is good"? Is everything worse than average bad (in which case it would never be option 2 or 3), or are things worse than the most common version bad (in which case it depends on the frequencies), or is it more complicated? CarlKenner (talk) 18:10, 14 October 2013 (UTC)

all of the above are possible. For the moment, community consensus is the only answer, and that's mostly just you, me and a few others. I do consider (Magnitude=0, Repute=Bad) as being 'neutral', but I've avoided adding an explicit 'Neutral/Mixed' field, for fear that everything will get lumped in there. I've considered having 2 scores:
  • rs1234(A;A) goodMag=4,badMag=0
  • rs1234(A;T) goodMag=4,badMag=2
so you can tell how good the good news is, and how bad the bad news is. But I've not yet found a case where I thought there was much of a payoff. It remains possible, and can be done in a way that maintains backwards compatibility with a single magnitude score.

And why does APOE4/APOE4 have a magnitude of 6? I thought it would be higher. CarlKenner (talk) 18:10, 14 October 2013 (UTC)

check the page history, at one point it was a 10. But there is only about 50% ability to predict Alzheimer's for an E4/E4 (src) and we've found plenty of other things that I think are justifyably higher (Rs113993960(-;-) others at Magnitude). We could go above 10 if we need. For the moment, 6 is enough to ensure that it's near the top of any report. I don't pretend 6 is the correct answer, only that is is good enough. If folks who have it want to push for higher or lower, I'd welcome their informed opinions.

And how do we handle SNPs on the non-crossing part of the X chromosome where men have a single letter and women have two letters (although in different parts of her body one or the other will be switched off)? Or SNPs on the non-crossing part of the Y chromosome where someone can normally only have one letter? Or SNPs on the mitochondrial DNA which we think of as only having one letter, but actually have multiple (identical?) "chromosomes" inside each mitochondrion and multiple mitochondria inside each cell? Or even the crossing part of the X and Y chromosomes? CarlKenner (talk) 18:10, 14 October 2013 (UTC)

Kinda, sorta. You're not even considering the multitude of ways each file format tries to communicate that, and the ambiguity from the different platforms. Even 23andMe has changed how they encode MT & Y over the years. Generally wiki redirects can handle a lof of this well enough Rs17001266(D;I) vs Rs17001266(-;C), while I look for a consistent pattern I'd like to enforce.

And how do we handle Y STR values? For that matter, what are Y STR values actually measuring? CarlKenner (talk) 18:10, 14 October 2013 (UTC)

I've not tried to handle STRs. SNPs have the virtue of a well defined standard name. If there was suddenly a substantial body of STR data in a consistent format for me to work with, I'd consider it. Until then, plenty of other real work to be done.


I have a v1 Genographic project file in .csv format that I just downloaded, and Promethease (exe) won't load it. It looks like this:

GPID,FWD4S7898G
Y Haplogroup,I1
Terminal Y SNP,M253,
Y SNPs
ysnp,M253,+
Y STRs
ystr,DYS439,11
ystr,DYS426,11
ystr,DYS393,13
ystr,DYS392,11
ystr,DYS391,10
ystr,DYS390,22
ystr,DYS389II,16
ystr,DYS389I,12
ystr,DYS388,14
ystr,DYS385b,14
ystr,DYS385a,14
ystr,DYS19,14



As far as I can tell, they only measure one SNP, which they decided to test based on my ystr values. There's not much info there, but it should be possible to import with a lookup table of names like M253 to rs numbers and the letters for + and -. I don't know whether that's worth doing. CarlKenner (talk) 18:10, 14 October 2013 (UTC)

The original fileformat that they provided had rs#s, and when SNPedia was used to show what could be known, NatGeo changed their format. If that's how they want to play, I'm not going to waste my time chasing them.


I assume promethease can't generate a report from 2 parents by itself. If I generate a 23andme raw file by combining my mother's and father's 23andme SNPs, should I code the possibly heterozygous ones as "C-" (- seems to mean "no call" for 23andme) if it has to have a C but the other letter could be either, or should I just leave them out? And is it possible to specify when two SNPs are on the same strand so it can tell me definitively that I am APOE4/APOE3 for example? CarlKenner (talk) 18:10, 14 October 2013 (UTC)

The way that you run Promethease, it cannot. However I've got many ways to run it, some of which handle this fine. I continue to working on improving the ability to support that in a friendly and simple way.
Promethease understands VCF file format which can capture the strand. I've got a fairly minor change to genosets which allows me to handle stranded haplotypes correctly. Again for lack of relevant test data, it has not been urgent.
--- cariaso 19:21, 14 October 2013 (UTC)

I have responded on my talk page.

Please email cariaso@snpedia.com I'd like to discuss somethings outside the wiki, but don't yet know how else to reach you. --- cariaso 08:26, 7 December 2013 (UTC)

Hi Carl, I see you've been adding a few SNPs manually lately. You don't need to fill in rsnum yourself as a bot will fill that in automatically soon enough. Also if you have a long list of SNPs to add you can either post them on my Talk page or email them to me at james.lick@gmail.com as my bot can create SNP pages from a list of SNPs quite easily. -- Jlick (talk) 07:24, 19 December 2013 (UTC)

Thanks Jlick. So if I want to add an SNP, I can just create the page, and wait for a bot? And does the bot automatically do the "On Chip" categories? Or do I have to do that myself? Which reminds me, we need to add on chip 23andMe v4 to all the SNPs that are on v4. I'm happy to do that, except my computer keeps overheating and turning off because it is super hot here (even at night), so I am almost computerless. CarlKenner (talk) 03:56, 20 December 2013 (UTC)
My bot will add in all the rsnum template plus the hapmap percentages if available. cariaso's bot will do the "on chip" stuff. -- Jlick (talk) 18:31, 20 December 2013 (UTC)
User:SNPediaBot does do on chip, doesn't do v4 yet, but its a pretty easy change and Il try to make that happen soon. --- cariaso 08:44, 21 December 2013 (UTC)
Is there anything else that needs doing for v4? CarlKenner (talk) 12:07, 21 December 2013 (UTC)