Bulk
Contents
Reminder[edit]
The content in SNPedia is available under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.
Introduction[edit]
Based on the format, frequency and complexity of your particular needs, you may wish to consider these sources:
The European Bioinformatics Institute hosts a DAS http://www.ebi.ac.uk/das-srv/easydas/bernat/das/SNPedia/features?segment=10:1,51319502
http://www.oppi.uef.fi/bioinformatics/varietas/ provides a web interface which includes SNPedia content. [PMID 20671203]
The file at http://www.snpedia.com/files/gbrowse/SNPedia.gff is updated semi-regularly and can be parsed to provide a reasonable list.
Forbidden[edit]
Bots which try to pull every version of every page crush the server, and will be banned long before you complete the full scrape.
Bots which try to pull every possible rs# (even the ones not in SNPedia) crush the server, and will be banned long before you complete the full scape. You must first ask which snps are in SNPedia with a query such as
- http://snpedia.com/index.php/Category:Is_a_snp
- http://bots.snpedia.com/api.php?action=query&list=categorymembers&cmtitle=Category:Is_a_snp&cmlimit=5000
This is easier to do with the APIs listed below. See the MediaWiki documentation
Programmers[edit]
Please aim your bots at bots.snpedia.com not www.snpedia.com[edit]
In 2016 MediaWiki updated two parts of their software that affect the use of SNPedia.
1. updated login mechanism. While the old method still works, a new Oauth based method is preferred. Extensive information from mediawiki is at https://www.mediawiki.org/wiki/OAuth/Owner-only_consumers
You can immediately create the necessary tokens for SNPedia by visiting http://bots.snpedia.com/index.php/Special:OAuthConsumerRegistration
2. The mechanism to request more than 500 members of a category has changed. MediaWiki documents this at https://www.mediawiki.org/wiki/API:Query#Generators_and_continuation
However not all languages and libraries have yet been updated to use these new mechanisms.
Check yours at or look for new ones at
https://www.mediawiki.org/wiki/API:Client_code
Here is some sample python 2.7 code which uses both of these correctly via
https://github.com/mwclient/mwclient
#!/usr/bin/env python import mwclient from mwclient import Site agent = 'MySNPBot. Run by User:Xyz. xyz@foo.com Using mwclient/' + mwclient.__ver__ # tokens and secrets are only necessary if your bot will write into SNPedia. # get your own tokens at http://bots.snpedia.com/index.php/Special:OAuthConsumerRegistration site = mwclient.Site(('https', 'bots.snpedia.com'), path='/', clients_useragent=agent, consumer_token='secret1', consumer_secret='secret2', access_token='secret3', access_secret='secret4') for i, page in enumerate(site.Categories['Is_a_snp']): print i, page.name
Your edits to this page to document the mechanism for your favorite language or library are encouraged.
Everything below is true, but somewhat out of date.
Perl[edit]
Please notice and use the line
$bot->{api}->{use_http_get} = 1;
which is necessary to ensure GET instead of POST for some older versions of the library.
Get all SNP names[edit]
use MediaWiki::Bot; my $bot = MediaWiki::Bot->new({ protocol => 'http', host => 'bots.snpedia.com', path => '/', }); $bot->{api}->{use_http_get} = 1; my @rsnums = $bot->get_pages_in_category('Category:Is_a_snp', {max=>0}); print join("\n",@rsnums),"\n";
How can I grab the text from pages?[edit]
#!/usr/bin/env perl use MediaWiki::Bot; my $bot = MediaWiki::Bot->new({ protocol => 'http', host => 'bots.snpedia.com', path => '/', }); $bot->{api}->{use_http_get} = 1; foreach my $rs ('rs1815739', 'rs4420638', 'rs6152') { my $text = $bot->get_text($rs); print '=' x 20,"$rs\n"; print $text; }
I need Genotypes and their Magnitude[edit]
#!/usr/bin/env perl; use strict; use warnings; use MediaWiki::Bot; my $bot = MediaWiki::Bot->new({ protocol => 'http', host => 'bots.snpedia.com', path => '/', }); $bot->{api}->{use_http_get} = 1; my $text = $bot->get_text('rs1234'); print '=' x 20,"$text\n"; print "\n\nThe above text should prove that we can read from SNPedia\n"; print "Getting some more info from SNPedia\n"; my @genotype = $bot->get_pages_in_category('Category:Is a genotype', {max=>0}) ; foreach my $geno (@genotype) { my $genotext = $bot->get_text($geno); my ($magnitude) = $genotext =~ m/magnitude\s*=\s*([+-\.\d]+)/; my ($beginingtext) = $genotext =~ m/\}\}(.{3,30})/s; $beginingtext = $genotext unless $beginingtext; $beginingtext =~ tr/\n/ /; $magnitude = '' unless defined $magnitude; print "Magnitude\t${magnitude}\tfor\t${geno}\t${beginingtext}\n"; }
Python[edit]
Those examples use wikitools
Get all SNP names[edit]
from wikitools import wiki, category site = wiki.Wiki("http://bots.snpedia.com/api.php") # open snpedia snps = category.Category(site, "Is_a_snp") snpedia = [] for article in snps.getAllMembersGen(namespaces=[0]): # get all snp-names as list and print them snpedia.append(article.title.lower()) print article.title
Grab a single SNP-page in full text[edit]
You get back a string that contains the unformated wiki-code:
from wikitools import wiki, category, page site = wiki.Wiki("http://bots.snpedia.com/api.php") snp = "rs7412" pagehandle = page.Page(site,snp) snp_page = pagehandle.getWikiText()
To parse mediawiki templates try https://github.com/earwig/mwparserfromhell
Ruby[edit]
These examples use the Mediawiki-gateway-gem
Please use versions 0.5.0 or later due to http://github.com/jpatokal/mediawiki-gateway/issues/24
Grab all SNP-pages that contain a specific text and iterate over the content[edit]
This example grabs all genotype-pages of a specific SNP
@snp = "Rs7412" mw = MediaWiki::Gateway.new("http://bots.snpedia.com/api.php") pages = mw.list(@snp + "(") # return an array of page-titles if pages != nil pages.each do |p| # iterate over the results and grab the full text for each page single_page = mw.get(p) puts single_page end end
R / Bioconductor[edit]
An R package to query data from SNPedia is available in the Bioconductor web site:
https://bioconductor.org/packages/SNPediaR
See Vignette for usage.
Development version of the library and some extra documentation may be found in GitHub:
https://github.com/genometra/SNPediaR
Limited to 500 entries?[edit]
Please see https://www.mediawiki.org/wiki/API:Query#Generators_and_continuation