mgkit.utils.dictionary module

Dictionary utils

class mgkit.utils.dictionary.HDFDict(file_name, table, cast=<type 'int'>, cache=True)

Bases: object

Changed in version 0.3.3: added cache in __init__

New in version 0.3.1.

Used a table in a HDFStore (from pandas) as a dictionary. The table must be indexed to perform well. Read only.


the dictionary cannot be modified and exception:ValueError will be raised if the table is not in the file

mgkit.utils.dictionary.apply_func_to_values(dictionary, func)

New in version 0.1.12.

Assuming a dictionary whose values are iterables, func is applied to each element of the iterable, retuning a set of all transformed elements.

  • dictionary (dict) – dictionary whose values are iterables
  • func (func) – function to apply to the dictionary values

dictionary with transformed values

Return type:


class mgkit.utils.dictionary.cache_dict_file(iterator, skip_lines=0)

Bases: object

New in version 0.3.0.

Used to cache the result of a function that yields a tuple (key and value). If the value is found in the internal dictionary (as the class behave), the correspondent value is returned, otherwise the iterator is advanced until the key is found.


>>> from import parse_accession_taxa_table
>>> i = parse_accession_taxa_table('nucl_gb.accession2taxid.gz', key=0)
>>> d = cache_dict_file(i)
>>> d['AH001684']
mgkit.utils.dictionary.combine_dict(keydict, valuedict)

Combine two dictionaries when the values of keydict are iterables. The combined dictionary has the same keys as keydict and the its values are sets containing all the values associated to keydict values in valuedict.

key1 -> [v1, v2, .., vN]

v1 -> [u1, u2, .., uN] v2 -> [t1, t2, .., tN]

Resulting dictionary will be

key1->{u1, u2, .., uN}

  • keydict (dict) – dictionary whose keys are the same as the returned dictionary
  • valuedict (dict) – dictionary whose values are the same as the returned dictionary
Return dict:

combined dictionary

mgkit.utils.dictionary.combine_dict_one_value(keydict, valuedict)

Combine two dictionaries by the value of the keydict is used as a key in valuedict and the resulting dictionary is composed of keydict keys and valuedict values.

Same as comb_dict(), but each value in keydict is a single element that is key in valuedict.

  • keydict (dict) – dictionary whose keys are the same as the returned dictionary
  • valuedict (dict) – dictionary whose values are the same as the returned dictionary
Return dict:

combined dictionary


Returns a dictionary with the NaN values taken out

mgkit.utils.dictionary.filter_ratios_by_numbers(ratios, min_num)

Returns from a dictionary only the items for which the length of the iterables that is the value of the item, is equal or greater of min_num.

  • ratios (dict) – dictionary key->list
  • min_num (int) – minimum number of elements in the value iterable
Return dict:

filtered dictionary

mgkit.utils.dictionary.find_id_in_dict(s_id, s_dict)

Finds a value ‘s_id’ in a dictionary in which the values are iterables. Returns a list of keys that contain the value.

  • s_id (dict) – element to look for in the dictionary’s values
  • d (object) – dictionary to search in
Return list:

list of keys in which d was found

Given a dictionary whose values (iterables) can be linked back to other keys, it returns a dictionary in which the keys are the original keys and the values are sets of keys to which they can be linked.

key1->[v1, v2] key2->[v3, v4] key3->[v2, v4]


key1->[key1, key3] key2->[key3] key3->[key1, key2]

  • id_map (dict) – dictionary of keys to link
  • black_list (iterable) – iterable of values to skip in making the links
Return dict:

linked dictionary


New in version 0.3.1.

Merges keys and values from a list/iterable of dictionaries. The resulting dictionary’s values are converted into sets, with the assumption that the values are one of the following: float, str, int, bool


Given a dictionary in the form: key->[v1, v2, .., vN], returns a dictionary in the form: v1->[key1, key2, .., keyN]

Parameters:map_dict (dict) – dictionary to reverse
Return dict:reversed dictionary
mgkit.utils.dictionary.split_dictionary_by_value(value_dict, threshold, aggr_func=<function median>, key_filter=None)

Splits a dictionary, whose values are iterables, based on a threshold:

  • one in which the result of aggr_func is lower than the threshold (first)
  • one in which the result of aggr_func is equal or greater than the threshold (second)
  • valuedict (dict) – dictionary to be splitted
  • threshold (number) – must be comparable to threshold
  • aggr_func (func) – function used to aggregate the dictionary values
  • key_filter (iterable) – if specified, only these key will be in the resulting dictionary

two dictionaries