mgkit.net.uniprot module¶
Contains function and constants for Uniprot access
-
mgkit.net.uniprot.
get_gene_info
(gene_ids, columns, max_req=50, contact=None)¶ New in version 0.1.12.
Get informations about a list of genes. it uses
query_uniprot()
to send the request and format the response in a dictionary.Parameters: Returns: dictionary where the keys are the gene_ids requested and the values are dictionaries with the names of the columns requested as keys and the corresponding values, which can be lists if the values are are semicolon separated strings.
Return type: Example
To get the taxonomy ids for some genes:
>>> uniprot.get_gene_info(['Q09575', 'Q8DQI6'], ['organism-id']) {'Q09575': {'organism-id': '6239'}, 'Q8DQI6': {'organism-id': '171101'}}
-
mgkit.net.uniprot.
get_gene_info_iter
(gene_ids, columns, contact=None, max_req=50)¶ New in version 0.3.3.
Alternative function to
get_gene_info()
, returning an iterator to avoid connections timeouts when updating a dictionaryThis funciton’s parameters are the same as
get_gene_info()
-
mgkit.net.uniprot.
get_ko_to_eggnog_mappings
(ko_ids, contact=None)¶ New in version 0.1.14.
It’s not possible to map in one go KO IDs to eggNOG IDs via the API in Uniprot. This function uses
query_uniprot()
to get all Uniprot IDs requested and the return a dictionary with all their eggNOG IDs they map to.Parameters: - ko_ids (iterable) – an iterable of KO IDs
- contact (str) – email address to be passed in the query (requested Uniprot API)
Returns: The format of the resulting dictionary is ko_id -> {eggnog_id1, ..}
Return type:
-
mgkit.net.uniprot.
get_mappings
(entry_ids, db_from='ID', db_to='EMBL', out_format='tab', contact=None)¶ Gets mapping of genes using Uniprot REST API. The db_from and db_to values are the ones accepted by Uniprot API. The same applies to out_format, the only processed formats are ‘list’, which returns a list of the mappings (should be used with one gene only) and ‘tab’, which returns a dictionary with the mapping. All other values returns a string with the newline stripped.
Parameters: - entry_ids (iterable) – iterable of ids to be mapped (there’s a limit) to the maximum length of a HTTP request, so it should be less than 50
- db_from (str) – string that identify the DB for elements in entry_ids
- db_to (str) – string that identify the DB to which map entry_ids
- out_format (str) – format of the mapping; ‘list’ and ‘tab’ are processed
- contact (str) – email address to be passed in the query (requested Uniprot API)
Returns: tuple, dict or str depending on out_format value
-
mgkit.net.uniprot.
get_sequences_by_ko
(ko_id, taxonomy, contact=None, reviewed=True)¶ Gets sequences from Uniprot, restricting to the taxon id passed.
Parameters: Returns: string with the fasta file downloaded
-
mgkit.net.uniprot.
get_uniprot_ec_mappings
(gene_ids, contact=None)¶ New in version 0.1.14.
Shortcut to download EC mapping of Uniprot IDs. Uses
get_gene_info()
passing the correct column (ec).
-
mgkit.net.uniprot.
ko_to_mapping
(ko_id, query, columns, contact=None)¶ Returns the mappings to the supplied KO. Can be used for any id, the query format is free as well as the columns returned. The only restriction is using a tab format, that is parsed.
Parameters: Note
each mapping in the column is separated by a ;
-
mgkit.net.uniprot.
parse_uniprot_response
(data, simple=True)¶ New in version 0.1.12.
Parses raw response from a Uniprot query (tab format only) from functions like
query_uniprot()
into a dictionary. It requires that the first column is the entry id (or any other unique id).Parameters: Returns: The format of the resulting dictionary is entry_id -> {column1 -> value, column2 -> value, ..} unless there’s only one column and simple is True, in which case the value is equal to the value of the only column.
Return type:
-
mgkit.net.uniprot.
query_uniprot
(query, columns=None, format='tab', limit=None, contact=None, baseurl='http://www.uniprot.org/uniprot/')¶ New in version 0.1.12.
Changed in version 0.1.13: added baseurl and made columns a default argument
Queries Uniprot, returning the raw response in tbe format specified. More informations at the page
Parameters: - query (str) – query to submit, as put in the input box
- columns (None, iterable) – list of columns to return
- format (str) – response format
- limit (int, None) – number of entries to return or None to request all entries
- contact (str) – email address to be passed in the query (requested Uniprot API)
- baseurl (str) – base url for the REST API, can be either
UNIPROT_GET
orUNIPROT_TAXONOMY
Returns: raw response from the query
Return type: Example
To get the taxonomy ids for some genes:
>>> uniprot.query_uniprot('Q09575 OR Q8DQI6', ['id', 'organism-id']) 'Entry\tOrganism ID\nQ8DQI6\t171101\nQ09575\t6239\n'
Warning
because of limits in the length of URLs, it’s advised to limit the length of the query string.