mgkit.mappings.utils module

Utilities to map genes

mgkit.mappings.utils.count_genes_in_mapping(gene_lists, labels, mapping, normalise=False)

Maps lists of ids to a mapping dictionary, returning a pandas.DataFrame in which the rows are the labels provided and the columns the categories to which the ids map. Each element of the matrix label-category is the sum of all ids in the relative gene list that maps to the specific category.

Parameters:
  • gene_lists (iterable) – an iterable in which each element is a iterable of ids that can be mapped to mapping
  • labels (iterable) – an iterable of strings that defines the labels to be used in the resulting rows in the pandas.DataFrame; must have the same length as gene_lists
  • mapping (dict) – a dictionary in the form: gene_id->[cat1, cat2, .., catN]
  • normalise (bool) – if True the counts are normalised over the total for each row.
Returns:

a pandas.DataFrame instance

mgkit.mappings.utils.group_annotation_by_mapping(annotations, mapping, attr='ko')

Group annotations by mapping dictionary

Parameters:
  • annotations (iterable) – iterable of gff.GFFKeg instances
  • mapping (dict) – dictionary with mappings for the attribute requested
  • attr (str) – attribute of the annotation to be used as key in mapping
Return dict:

dictionary category->annotations