mgkit.mappings.pandas_map module

Module that contains mapping operations on pandas data structures

mgkit.mappings.pandas_map.calc_coefficient_of_variation(dataframe)

Calculate coefficient of variation for a DataFrame. Uses formula from Wikipedia

The formula used is \(\left (1 + \frac {1}{4n} \right ) * c_{v}\) where \(c_{v} = \frac {s}{\bar{x}}\)

mgkit.mappings.pandas_map.concatenate_and_rename_tables(dataframes, roots)

Concatenates a list of pandas.DataFrame instances and renames the columns prepending a string to each column in each table from a list of prefixes.

Parameters:
  • dataframes (iterable) – iterable of DataFrame instances
  • roots (iterable) – list of prefixes to append to the column names of each DataFrame
Return DataFrame:
 

returns a DataFrame instance

Todo

  • move to pandas_utils?
mgkit.mappings.pandas_map.group_dataframe_by_mapping(dataframe, mapping, root_taxon, name_dict=None)

Return a pandas.DataFrame filtered by mapping and root taxon, the values for each column is averaged over all genes mapping to a category.

Parameters:
  • dataframe (DataFrame) – DataFrame with multindex gene-root
  • mapping (dict) – dictionary of category->genes
  • root_taxon (str) – root taxon to group genes
  • name_dict (dict) – dictionary of category->name
Return DataFrame:
 

DataFrame filtered

mgkit.mappings.pandas_map.make_stat_table(dataframes, roots)

Produces a pandas.DataFrame that summarise the supplied DataFrames. The stats include mean, stdev and coefficient of variation for each root taxon.

Parameters:
  • dataframes (iterable) – iterable of DataFrame instances
  • roots (iterable) – list of root taxa to which each table belongs
Return DataFrame:
 

returns a DataFrame instance