mgkit.workflow.assembly module

Workflow associated with assembly statistics and evaluation

mgkit.workflow.assembly.assign_contigs_to_taxa(annotations, root_map=None, black_list=None)

Groups annotations by contig (seq_id) and counts how many contigs a taxon, or its root if root_map is supplied, have been assigned to.

The actual form of the dictionary like this:

taxon->ranks->count

Note

the number of ranks for a taxon is not pretedermined, but depends on the values returned by rank_annotations_by_attr().

Parameters:
  • annotations (iterable) – list of gff.GFFKegg instances
  • root_map (dict) – dictionary taxon->root
Return dict:

dictionary

mgkit.workflow.assembly.basic_stats(array, sep)

Returns formatted basic statistics for contig lengths

mgkit.workflow.assembly.filter_contig_assignments(contig_assign, threshold=5, min_counts=1)

Filter contigs assignments using a threshold for the rank: all rank counts belonging to a taxon which are greater than or equal to threshold will be summed up.

Parameters:
Return dict:

dictionary in the form taxon_name->count

mgkit.workflow.assembly.rank_annotations_by_attr(annotations, attr='taxon')

For all annotations in the list (usually all annotations for a contig), counts how many time a set attribute ‘attr’ appears. The resulting dictionary is then sorted by the number of counts and the one with the highest count is ranked by how much it represent the total number of counts.

The rank is an integer number between 0 and 10.

Parameters:
  • annotations (iterable) – list of gff.GFFKegg instances
  • attr (str) – the attribute for which the annotations are counted
Return tuple:

the attr with the most counts and its rank

mgkit.workflow.assembly.write_fasta_summary(file_handle, seq_lengths, seq_lengths_filt, sep='\t')

Write summary file for assembly

Parameters:
  • file_handle – file handle for output
  • seq_lengths (array) – array for sequence lengths
  • seq_lengths_filt (array) – array for sequence lengths of annotated contigs
  • sep (str) – string used as column separator