mgkit.kegg module

Module containing classes and functions to access Kegg data

class mgkit.kegg.KeggClientRest(cache=None)

Bases: future.types.newobject.newobject

Changed in version 0.3.1: added a cache attribute for some methods

Kegg REST client

The class includes methods and data to use the REST API provided by Kegg. At the moment it provides methods to for ‘link’, ‘list’ and ‘get’ operations,

Kegg REST API

api_url = 'http://rest.kegg.jp/'
cache = None
contact = None
conv(target_db, source_db, strip=True)

New in version 0.3.1.

Kegg Help:

http://rest.kegg.jp/conv/<target_db>/<source_db>

(<target_db> <source_db>) = (<kegg_db> <outside_db>) | (<outside_db> <kegg_db>)

For gene identifiers: <kegg_db> = <org> <org> = KEGG organism code or T number <outside_db> = ncbi-proteinid | ncbi-geneid | uniprot

For chemical substance identifiers: <kegg_db> = drug | compound | glycan <outside_db> = pubchem | chebi http://rest.kegg.jp/conv/<target_db>/<dbentries>

For gene identifiers: <dbentries> = database entries involving the following <database> <database> = <org> | genes | ncbi-proteinid | ncbi-geneid | uniprot <org> = KEGG organism code or T number

For chemical substance identifiers: <dbentries> = database entries involving the following <database> <database> = drug | compound | glycan | pubchem | chebi

Examples

>>> kc = KeggClientRest()
>>> kc.conv('ncbi-geneid', 'eco')
{'eco:b0217': {'ncbi-geneid:949009'},
 'eco:b0216': {'ncbi-geneid:947541'},
 'eco:b0215': {'ncbi-geneid:946441'},
 'eco:b0214': {'ncbi-geneid:946955'},
 'eco:b0213': {'ncbi-geneid:944903'},
...
>>> kc.conv('ncbi-proteinid', 'hsa:10458+ece:Z5100')
{'10458': {'NP_059345'}, 'Z5100': {'AAG58814'}}
cpd_desc_re = <_sre.SRE_Pattern object>
cpd_re = <_sre.SRE_Pattern object at 0x4544a20>
empty_cache(methods=None)

New in version 0.3.1.

Empties the cache completely or for a specific method(s)

Parameters:methods (iterable, str) – string or iterable of strings that are part of the cache. If None the cache is fully emptied
find(query, database, options=None, strip=True)

New in version 0.3.1.

Kegg Help:

http://rest.kegg.jp/find/<database>/<query>

<database> = pathway | module | ko | genome | <org> | compound | glycan |
reaction | rclass | enzyme | disease | drug | dgroup | environ | genes | ligand

<org> = KEGG organism code or T number

http://rest.kegg.jp/find/<database>/<query>/<option>

<database> = compound | drug <option> = formula | exact_mass | mol_weight

Examples

>>> kc = KeggClientRest()
>>> kc.find('CH4', 'compound')
{'C01438': 'Methane; CH4'}
>>> kc.find('K00844', 'genes', strip=False)
{'tped:TPE_0072': 'hexokinase; K00844 hexokinase [EC:2.7.1.1]',
...
>>> kc.find('174.05', 'compound', options='exact_mass')
{'C00493': '174.052823',
 'C04236': '174.052823',
 'C16588': '174.052823',
 'C17696': '174.052823',
 'C18307': '174.052823',
 'C18312': '174.052823',
 'C21281': '174.052823'}
get_entry(k_id, option=None)

Changed in version 0.3.1: this is now cached

The method abstract the use of the ‘get’ operation in the Kegg API

Parameters:
  • k_id (str) – kegg id of the resource to get
  • option (str) – optional, to specify a format
get_ids_names(target='ko', strip=True)

New in version 0.1.13.

Changed in version 0.3.1: the call is now cached

Returns a dictionary with the names/description of all the id of a specific target, (ko, path, cpd, etc.)

If strip=True the id will stripped of the module abbreviation (e.g. md:M00002->M00002)

get_ortholog_pathways()

Gets ortholog pathways, replace ‘map’ with ‘ko’ in the id

Returns a dictionary with the mappings KO->compounds for a specific Pathway or module

get_reaction_equations(ids, max_len=10)

Get the equation for the reactions

id_prefix = {'C': 'cpd', 'K': 'ko', 'R': 'rn', 'k': 'map', 'm': 'path'}
ko_desc_re = <_sre.SRE_Pattern object>

New in version 0.2.0.

Implements “link” operation in Kegg REST

http://www.genome.jp/linkdb/

Changed in version 0.3.1: removed strip and cached the results

The method abstract the use of the ‘link’ operation in the Kegg API

The target parameter can be one of the following:

pathway | brite | module | disease | drug | environ | ko | genome |
<org> | compound | glycan | reaction | rpair | rclass | enzyme

<org> = KEGG organism code or T number
Parameters:
  • target (str) – the target db
  • ids – can be either a single id as a string or a list of ids
  • strip (bool) – if the prefix (e.g. ko:K00601) should be stripped
  • max_len (int) – the maximum number of ids to retrieve with each request, should not exceed 50
Return dict:

dictionary mapping requested id to target id(s)

list_ids(k_id)

The method abstract the use of the ‘list’ operation in the Kegg API

The k_id parameter can be one of the following:

pathway | brite | module | disease | drug | environ | ko | genome |
<org> | compound | glycan | reaction | rpair | rclass | enzyme

<org> = KEGG organism code or T number
Parameters:k_id (str) – kegg database to get list of ids
Return list:list of ids in the specified database
load_cache(file_handle)

New in version 0.3.1.

Loads the cache from file

rn_eq_re = <_sre.SRE_Pattern object>
rn_name_re = <_sre.SRE_Pattern object>
write_cache(file_handle)

New in version 0.3.1.

Write the cache to file

class mgkit.kegg.KeggCompound(cp_id=None, description='')

Bases: future.types.newobject.newobject

Kegg compound

__eq__(other)
>>> KeggCompound('test') == KeggCompound('test')
True
>>> KeggCompound('test') == 1
False
__ne__(other)
>>> KeggCompound('test') != KeggCompound('test1')
True
>>> KeggCompound('test') != 1
True
class mgkit.kegg.KeggData(fname=None, gen_maps=True)

Bases: future.types.newobject.newobject

Deprecated since version 0.3.4.

gen_ko_map()
gen_maps()
get_cp_names()
get_ko_names()
get_ko_pathway_map(black_list=None)
get_ko_pathways(ko_id)
get_pathway_ko_map(black_list=None)
get_rn_names()
load_data(fname)
maps = None
pathways = None
save_data(fname)
class mgkit.kegg.KeggMapperBase(fname=None)

Bases: future.types.newobject.newobject

Deprecated since version 0.3.4.

Base object for Kegg mapping classes

get_id_map()

Returns a mapping->KOs dictionary (a reverse mapping to get_ko_map)

get_id_names()

Returns a copy of the mapping names

get_ko_map()

Returns a copy of the KO->mapping dictionary

static ko_to_mapping(ko_id, query, columns, contact=None)

Returns the mappings to the supplied KO. Can be used for any id, the query format is free as well as the columns returned. The only restriction is using a tab format, that is parsed.

Parameters:
  • ko_id (str) – id used in the query
  • query (str) – query passed to the Uniprot API, ko_id is replaced using str.format()
  • column (str) – column used in the results table used to map the ids
  • contact (str) – email address to be passed in the query (requested Uniprot API)

Note

each mapping in the column is separated by a ;

load_data(fname)

Loads mapping data to disk

save_data(fname)

Saves mapping data to disk

class mgkit.kegg.KeggModule(entry=None, old=False)

Bases: future.types.newobject.newobject

New in version 0.1.13.

Used to extract information from a pathway module entry in Kegg

The entry, as a string, can be either passed at instance creation or with KeggModule.parse_entry()

classes = None
compounds = None
entry = ''
find_submodules()

New in version 0.3.0.

Returns the possible submodules, as a list of tuples where the elements are the first and last compounds in a submodule

first_cp

Returns the first compound in the module

last_cp

Returns the first compound in the module

name = ''
parse_entry(entry)

Parses a Kegg module entry and change the instance values. By default the reactions IDs are substituted with the KO IDs

parse_entry2(entry)

New in version 0.3.0.

Parses a Kegg module entry and change the instance values. By default the reactions IDs are NOT substituted with the KO IDs.

static parse_reaction(line, ko_ids=None)

Changed in version 0.3.0: cleaned the parsing

parses the lines with the reactions and substitute reaction IDs with the corresponding KO IDs if provided

reactions = None
to_edges(id_only=None)

Changed in version 0.3.0: added id_only and changed to reflect changes in reactions

Returns the reactions as edges that can be supplied to make graph.

Parameters:id_only (None, iterable) – if None the returned edges are for the whole module, if an iterable (converted to a set), only edges for those reactions are returned
Yields:tuple – the elements are the compounds and reactions in the module
class mgkit.kegg.KeggOrtholog(ko_id=None, description='', reactions=None)

Bases: future.types.newobject.newobject

Kegg Ortholog gene

__eq__(other)
>>> KeggOrtholog('test') == KeggOrtholog('test')
True
>>> KeggOrtholog('test') == 1
False
__ne__(other)
>>> KeggOrtholog('test') != KeggOrtholog('test1')
True
>>> KeggOrtholog('test') != 1
True
class mgkit.kegg.KeggPathway(path_id=None, description=None, genes=None)

Bases: future.types.newobject.newobject

Kegg Pathway

__eq__(other)
>>> KeggPathway('test') == KeggPathway('test')
True
>>> KeggPathway('test') == 1
False
__ne__(other)
>>> KeggPathway('test') != KeggPathway('test1')
True
>>> KeggPathway('test') != 1
True
class mgkit.kegg.KeggReaction(entry)

Bases: future.types.newobject.newobject

Changed in version 0.3.1: reworked, only stores the equation

Kegg Reaction, used for parsing the equation line

left_cp = None
right_cp = None
rn_id = None
mgkit.kegg.download_data(fname='kegg.pickle', contact=None)

Deprecated since version 0.3.4.

mgkit.kegg.parse_reaction(line, prefix=('C', 'G'))

New in version 0.3.1.

Parses a reaction equation from Kegg, returning the left and right components. Needs testing

Parameters:line (str) – reaction string
Returns:left and right components as sets
Return type:tuple
Raises:ValueError – if the