Changes

0.3.1

This release adds several scripts and commands. Successive releases 0.3.x releases will be used to fix bugs and refine the APIs and CLI. Most importantly, since the publishing of the first paper using the framework, the releases will go torward the removal of as much deprecated code as possible. At the same time, a general review of the code to be able to run on Python3 (probably via the six package) will start. The general idea is to reach as a full removal of legacy code in 0.4.0, while full Python3 compatibility is the aim of 0.5.0, which also means dropping dependencies that are not compatible with Python3.

Added

Changed

Fixed

Besides smaller fixes:

Deprecated

0.3.0

A lot of bugs were fixed in this release, especially for reading NCBI taxonomy and using the msgpack format to save a UniprotTaxonomy instance. Also added a tutorial for profiling a microbial community using MGKit and BLAST (Profile a Community with BLAST)

Added

Changed

0.2.5

Changed

Added

0.2.4

Changed

  • mgkit.utils.sequence.get_contigs_info() now accepts a dictionary name->seq or a list of sequences, besides a file name (r536)
  • add-gff-info counts command now removes trailing commas from the samples list
  • the axes are turned off after the dendrogram is plo

Fixed

  • the snp_parser script requirements were set wrong in setup.py (r540)
  • uncommented lines to download sample data to build documentation (r533)
  • add-gff-info uniprot command now writes the lineage attribute correctly (r538)

0.2.3

The installation dependencies are more flexible, with only numpy as being required. To install every needed packages, you can use:

$ pip install mgkit[full]

Added

  • new option to pass the query sequences to blast2gff, this allows to add the correct frame of the annotation in the GFF
  • added the attributes evalue, subject_start and subject_end to the output of blast2gff. The subject start and end position allow to understand on which frame of the subject sequence the match was found
  • added the options to annotate the heatmap with the numbers. Also updated the relative example notebook
  • Added the option to reads the taxonomy from NCBI dump files, using mgkit.taxon.UniprotTaxonomy.read_from_ncbi_dump(). This make it faster to get the taxonomy file
  • added argument to return information from mgkit.net.embl.datawarehouse_search(), in the form of tab separated data. The argument fields can be used when display is set to report. An example on how to use it is in the function documentation
  • added a bash script download-taxonomy.sh that download the taxonomy
  • added script venv-docs.sh to build the documentation in HTML under a virtual environment. matplotlib on MacOS X raises a RuntimeError, because of a bug in virtualenv, the documentation can be first build with this, after the script create-apidoc.sh is create the API documentation. The rest of the documentation (e.g. the PDF) can be created with make as usual, afterwards
  • added mgkit.net.pfam, with only one function at the moment, that returns the descriptions of the families.
  • added pfam command to add-gff-info, using the mentioned function, it adds the description of the Pfam families in the GFF file
  • added a new exception, used internally when an additional dependency is needed

Changed

  • using the NCBI taxonomy dump has two side effects:

    • the scientific/common names are kept as is, not lower cased as was before
    • a merged file is provided for taxon_id that changed. While the old taxon_id is kept in the taxonomy, this point to the new taxon, to keep backward compatibility
  • renamed the add-gff-info gitaxa command to addtaxa. It now accepts more data sources (dictionaries) and is more general

  • changed mgkit.net.embl.datawarehouse_search() to automatically set the limit at 100,000 records

  • the taxonomy can now be saved using msgpack, making it faster to read/write it. It’s also more compact and better compression ratio

  • the mgkit.plots.heatmap.grouped_spine() now accept the rotation of the labels as option

  • added option to use another attribute for the gene_id in the get-gff-info script gtf command

  • added a function to compare the version of MGKit used, throwing a warning, when it’s different (mgkit.check_version())

  • removed test for old SNPs structures and added the same tests for the new one

  • mgkit.snps.classes.GeneSNP now caches the number of synonymous and non-synonymous SNPs for better speed

  • mgkit.io.gff.GenomicRange.__contains__() now also accepts a tuple (start, end) or another GenomicRange instance

Fixed

0.2.2

Added

Changed

Removed

  • deprecated code from the snps package

0.2.1

Added

Deprecated

  • mgkit.filter.taxon.filter_taxonomy_by_lineage()
  • mgkit.filter.taxon.filter_taxonomy_by_rank()

Removed

  • removed old filter_gff script

0.2.0

  • added creation of wheel distribution
  • changes to ensure compatibility with alter pandas versions
  • mgkit.io.gff.Annotation.get_ec() now returns a set, reflected changes in tests
  • added a –cite option to scripts
  • fixes to tutorial
  • updated documentation for sphinx 1.3
  • changes to diagrams
  • added decoration to raise warnings for deprecated functions
  • added possibility for mgkit.counts.func.load_sample_counts() info_dict to be a function instead of a dictionary
  • consolidation of some eggNOG structures
  • added more spine options in mgkit.plots.heatmap.grouped_spine()
  • added a length property to mgkit.io.gff.Annotation
  • changed filter-gff script to customise the filtering function, from the default one, also updated the relative documentation
  • fixed a few plot functions

0.1.16

0.1.15

0.1.14

0.1.13

0.1.12

0.1.11

  • removed rst2pdf for generating a PDF for documentation. Latex is preferred
  • corrections to documentation and example script
  • removed need for joblib library in translate_seq script: used only if available (for using multiple processors)
  • deprecated mgkit.snps.funcs.combine_snps_in_dataframe() and mgkit.snps.funcs.combine_snps_in_dataframe(): mgkit.snps.funcs.combine_sample_snps() should be used
  • refactored some tests and added more
  • added docs_req.txt to help build the documentation ont readthedocs.org
  • renamed mgkit.snps.classes.GeneSyn gid and taxon attributes to gene_id and taxon_id. The old names are still available for use (via properties), but the will be taken out in later versions. Old pickle data should be loaded and saved again before in this release
  • added a few convenience functions to ease the use of combine_sample_snps()
  • added function mgkit.snps.funcs.significance_test() to test the distributions of genes share between two taxa.
  • fixed an issue with deinterleaving sequence data from khmer
  • added mgkit.snps.funcs.flat_sample_snps()
  • Added method to mgkit.kegg.KeggClientRest to get names for all ids of a certain type (more generic than the various get_*_names)
  • added first implementation of mgkit.kegg.KeggModule class to parse a Kegg module entry
  • mgkit.snps.conv_func.get_rank_dataframe(), mgkit.snps.conv_func.get_gene_map_dataframe()