Have questions? Visit https://www.reddit.com/r/SNPedia

Genes for Good

From SNPedia

https://genesforgood.sph.umich.edu/

After answering (how many?) survey questions on Facebook, you are able to get free genetic testing which gives you your raw data. This data is compatible with Promethease, and according to Genes for Good, the raw data consists of ~550,000 genotypes, assayed by microarray.

Some Genes for Good files contain imputed data. These files will yield the largest Promethease report, but since some of your genotypes have been 'imputed', they are assumed, and may not be true. Individuals from ethnic groups which were not part of the Genes for Good training sets can expect more errors than people from similar ethnic groups. It's unclear exactly what groups were used for training, but it's safe to assume western europeans are well represented.

Files that don't mentioned 'imputed' should only include genotypes which were actually observed. This results in a smaller Promethease report, but one with higher confidence.

Some of their files also indicate 'noY_noMT' indicating that SNPs from these haploid chromosomes are not included. Without them it is impossible to see any data related to your haplogroups.

For a small ($2) additional fee, you can combine (pool) additional files together, which might let you have the best of both worlds.


Which Files Can Be Uploaded to Promethease?[edit]

Genes for Good appears to offer several download options. If you download a zipped file containing all your raw data, you will need to unzip that file first. There are usually 9 unzipped files, as shown here:

Genes for Good Unzipped Files to Use With Promethease2.jpg

While Promethease can use the files in the VCF and 23andMe .txt formats, as shown in the image we recommend using the files that are in .gz format, since they are compressed and will upload quicker.


Interpreting a Pooled Report[edit]

If you choose to add your imputed data to your original data to get a combined Promethease report, notice that many of the genotypes in your Promethease report say 'count 2'. Those were the ones in both files, original and imputed. The ones that were only in one file don't say that. Since we expect everything from the original file to be in the imputed, that is enough to let you know what's imputed.

In practice there are a few genos that differ between the files, especially if you combine data from different companies. This relates mostly to different representations of the same information; for example, 23andMe chooses to use II or DD or DI to indicate in/dels (insertions & deletions). Genes for Good will usually use the actual genotype so you'll probably see (for example) rs1234(G;G) or rs1234(;) or rs1234(-;G). By clicking the checkbox for conflicts, you can find these.

See also VCF.