mgkit.net.uniprot module

Contains function and constants for Uniprot access

mgkit.net.uniprot.UNIPROT_GET = 'http://www.uniprot.org/uniprot/'

URL to Uniprot REST API

mgkit.net.uniprot.UNIPROT_MAP = 'http://www.uniprot.org/mapping/'

URL to Uniprot mapping REST API

mgkit.net.uniprot.UNIPROT_TAXONOMY = 'http://www.uniprot.org/taxonomy/'

URL to Uniprot REST API - Taxonomy

mgkit.net.uniprot.get_gene_info(gene_ids, columns, max_req=50, contact=None)[source]

New in version 0.1.12.

Get informations about a list of genes. it uses query_uniprot() to send the request and format the response in a dictionary.

Parameters:
  • gene_ids (iterable, str) – gene id(s) to get informations for
  • columns (list) – list of columns
  • max_req (int) – number of maximum gene_ids per request
  • contact (str) – email address to be passed in the query (requested Uniprot API)
Returns:

dictionary where the keys are the gene_ids requested and the values are dictionaries with the names of the columns requested as keys and the corresponding values, which can be lists if the values are are semicolon separated strings.

Return type:

dict

Example

To get the taxonomy ids for some genes:

>>> uniprot.get_gene_info(['Q09575', 'Q8DQI6'], ['organism-id'])
{'Q09575': {'organism-id': '6239'}, 'Q8DQI6': {'organism-id': '171101'}}
mgkit.net.uniprot.get_gene_info_iter(gene_ids, columns, contact=None, max_req=50)[source]

New in version 0.3.3.

Alternative function to get_gene_info(), returning an iterator to avoid connections timeouts when updating a dictionary

This funciton’s parameters are the same as get_gene_info()

mgkit.net.uniprot.get_ko_to_eggnog_mappings(ko_ids, contact=None)[source]

New in version 0.1.14.

It’s not possible to map in one go KO IDs to eggNOG IDs via the API in Uniprot. This function uses query_uniprot() to get all Uniprot IDs requested and the return a dictionary with all their eggNOG IDs they map to.

Parameters:
  • ko_ids (iterable) – an iterable of KO IDs
  • contact (str) – email address to be passed in the query (requested Uniprot API)
Returns:

The format of the resulting dictionary is ko_id -> {eggnog_id1, ..}

Return type:

dict

mgkit.net.uniprot.get_mappings(entry_ids, db_from='ID', db_to='EMBL', out_format='tab', contact=None)[source]

Gets mapping of genes using Uniprot REST API. The db_from and db_to values are the ones accepted by Uniprot API. The same applies to out_format, the only processed formats are ‘list’, which returns a list of the mappings (should be used with one gene only) and ‘tab’, which returns a dictionary with the mapping. All other values returns a string with the newline stripped.

Parameters:
  • entry_ids (iterable) – iterable of ids to be mapped (there’s a limit) to the maximum length of a HTTP request, so it should be less than 50
  • db_from (str) – string that identify the DB for elements in entry_ids
  • db_to (str) – string that identify the DB to which map entry_ids
  • out_format (str) – format of the mapping; ‘list’ and ‘tab’ are processed
  • contact (str) – email address to be passed in the query (requested Uniprot API)
Returns:

tuple, dict or str depending on out_format value

mgkit.net.uniprot.get_sequences_by_ko(ko_id, taxonomy, contact=None, reviewed=True)[source]

Gets sequences from Uniprot, restricting to the taxon id passed.

Parameters:
  • ko_id (str) – KO id of the sequences to download
  • taxonomy (int) – id of the taxon
  • contact (str) – email address to be passed in the query (requested by Uniprot API)
  • reviewed (bool) – if the sequences requested must be reviewed
Returns:

string with the fasta file downloaded

mgkit.net.uniprot.get_uniprot_ec_mappings(gene_ids, contact=None)[source]

New in version 0.1.14.

Shortcut to download EC mapping of Uniprot IDs. Uses get_gene_info() passing the correct column (ec).

mgkit.net.uniprot.ko_to_mapping(ko_id, query, columns, contact=None)[source]

Returns the mappings to the supplied KO. Can be used for any id, the query format is free as well as the columns returned. The only restriction is using a tab format, that is parsed.

Parameters:
  • ko_id (str) – id used in the query
  • query (str) – query passed to the Uniprot API, ko_id is replaced using str.format()
  • column (str) – column used in the results table used to map the ids
  • contact (str) – email address to be passed in the query (requested Uniprot API)

Note

each mapping in the column is separated by a ;

mgkit.net.uniprot.parse_uniprot_response(data, simple=True)[source]

New in version 0.1.12.

Parses raw response from a Uniprot query (tab format only) from functions like query_uniprot() into a dictionary. It requires that the first column is the entry id (or any other unique id).

Parameters:
  • data (str) – string response from Uniprot
  • simple (bool) – if True and the number of columns is 1, the dictionary returned has a simplified structure
Returns:

The format of the resulting dictionary is entry_id -> {column1 -> value, column2 -> value, ..} unless there’s only one column and simple is True, in which case the value is equal to the value of the only column.

Return type:

dict

mgkit.net.uniprot.query_uniprot(query, columns=None, format='tab', limit=None, contact=None, baseurl='http://www.uniprot.org/uniprot/')[source]

New in version 0.1.12.

Changed in version 0.1.13: added baseurl and made columns a default argument

Queries Uniprot, returning the raw response in tbe format specified. More informations at the page

Parameters:
  • query (str) – query to submit, as put in the input box
  • columns (None, iterable) – list of columns to return
  • format (str) – response format
  • limit (int, None) – number of entries to return or None to request all entries
  • contact (str) – email address to be passed in the query (requested Uniprot API)
  • baseurl (str) – base url for the REST API, can be either UNIPROT_GET or UNIPROT_TAXONOMY
Returns:

raw response from the query

Return type:

str

Example

To get the taxonomy ids for some genes:

>>> uniprot.query_uniprot('Q09575 OR Q8DQI6', ['id', 'organism-id'])
'Entry\tOrganism ID\nQ8DQI6\t171101\nQ09575\t6239\n'

Warning

because of limits in the length of URLs, it’s advised to limit the length of the query string.