# Changes¶

## 0.3.4¶

General cleanup and testing release. Major changes:

• general moving to Python2 (2.7) and Python3 (3.5+) support, using the future package and when convenient checks for the version of python installed
• setup includes now all the optional dependencies, since this makes it easier to deal with conda environments
• move to pytest from nose, since it allows some functionality that interests me, along with the reorganisation of the test modules and skips of tests that cannot be executed (like mongodb)
• move from urlib to using requests, which also helps with python3 support
• more careful with some dependencies, like the lzma module and msgpack
• addition of more tests, to help the porting to python3, along with a tox configuration
• matplotlib.pyplot is still in the mgkit.plots.unused, but it is not imported when the parent package is, now. It is still needed in the mgkit.plots.utils functions, so the import has been moved inside the function. This should help with virtual environments and test suites
• renamed mgkit.taxon.UniprotTaxonomy to mgkit.taxon.Taxonomy, since it’s really NCBI taxonomy and it’s preferred to download the data from there. Same for mgkit.taxon.UniprotTaxonTuple to mgkit.taxon.TaxonTuple, with an alias for old name there, but will be removed in a later version
• download_data was removed. Taxonomy should be downloaded using download-taxonomy.sh, and the mgkit.mappings is in need of refactoring to remove old and now ununsed functionality
• added mgkit.taxon.Taxonomy.get_ranked_id()
• using a sphinx plugin to render the jupyter notebooks instead of old solution
• rerun most of the tutorial and updated commands for newest available software (samtools/bcftools) and preferred the SNP calling from bcftools

### Scripts¶

This is a summary of notable changes, it is advised to check the changes in the command line interface for several scripts

• changed several scripts command line interface, to adapt to the use of _click_
• taxon-utils lca has one options only to specify the output format, also adding the option to output a format that can be used by add-gff-info addtaxa
• taxon-utils filter support the filtering of table files, when they are in a 2-columns format, such as those that are downloaded by download-ncbi-taxa.sh
• removed the eggnog and taxonomy commands from add-gff-info, the former since it’s not that useful, the latter because it’s possible to achieve the same results using taxon-utils with the new output option
• removed the rand command of fastq-utils since it was only for testing and the FastQ parser is the one from mgkit.io.fastq
• substantial changes where made to commands values and sequence of the filter-gff script
• sampling-utils rand_seq now can save the model used and reload it
• removed download_data and download_profiles, since they are not going to be used in the next tutorial and it is preferred now to use BLAST and then find the LCA with taxon-utils

### Python3¶

At the time of writing all tests pass on Python 3.5, but more tests are needed, along with some new ones for the blast parser and the scripts. Some important changes:

In general new projects will be worked on using Python 3.5 and the next releases will prioritise that (0.4.0 and later). If bugfixes are needed and Python 3 cannot be used, this version branch (0.3.x) will be used to fix bugs for users.

## 0.3.2¶

Removed deprecated code

## 0.3.1¶

This release adds several scripts and commands. Successive releases 0.3.x releases will be used to fix bugs and refine the APIs and CLI. Most importantly, since the publishing of the first paper using the framework, the releases will go torward the removal of as much deprecated code as possible. At the same time, a general review of the code to be able to run on Python3 (probably via the six package) will start. The general idea is to reach as a full removal of legacy code in 0.4.0, while full Python3 compatibility is the aim of 0.5.0, which also means dropping dependencies that are not compatible with Python3.

### Fixed¶

Besides smaller fixes:

## 0.3.0¶

A lot of bugs were fixed in this release, especially for reading NCBI taxonomy and using the msgpack format to save a UniprotTaxonomy instance. Also added a tutorial for profiling a microbial community using MGKit and BLAST (Profile a Community with BLAST)

## 0.2.4¶

### Changed¶

• mgkit.utils.sequence.get_contigs_info() now accepts a dictionary name->seq or a list of sequences, besides a file name (r536)
• add-gff-info counts command now removes trailing commas from the samples list
• the axes are turned off after the dendrogram is plo

### Fixed¶

• the snp_parser script requirements were set wrong in setup.py (r540)
• add-gff-info uniprot command now writes the lineage attribute correctly (r538)

## 0.2.3¶

The installation dependencies are more flexible, with only numpy as being required. To install every needed packages, you can use:

\$ pip install mgkit[full]


• new option to pass the query sequences to blast2gff, this allows to add the correct frame of the annotation in the GFF
• added the attributes evalue, subject_start and subject_end to the output of blast2gff. The subject start and end position allow to understand on which frame of the subject sequence the match was found
• added the options to annotate the heatmap with the numbers. Also updated the relative example notebook
• Added the option to reads the taxonomy from NCBI dump files, using mgkit.taxon.UniprotTaxonomy.read_from_ncbi_dump(). This make it faster to get the taxonomy file
• added argument to return information from mgkit.net.embl.datawarehouse_search(), in the form of tab separated data. The argument fields can be used when display is set to report. An example on how to use it is in the function documentation
• added script venv-docs.sh to build the documentation in HTML under a virtual environment. matplotlib on MacOS X raises a RuntimeError, because of a bug in virtualenv, the documentation can be first build with this, after the script create-apidoc.sh is create the API documentation. The rest of the documentation (e.g. the PDF) can be created with make as usual, afterwards
• added mgkit.net.pfam, with only one function at the moment, that returns the descriptions of the families.
• added pfam command to add-gff-info, using the mentioned function, it adds the description of the Pfam families in the GFF file
• added a new exception, used internally when an additional dependency is needed

### Changed¶

• using the NCBI taxonomy dump has two side effects:

• the scientific/common names are kept as is, not lower cased as was before
• a merged file is provided for taxon_id that changed. While the old taxon_id is kept in the taxonomy, this point to the new taxon, to keep backward compatibility
• renamed the add-gff-info gitaxa command to addtaxa. It now accepts more data sources (dictionaries) and is more general

• changed mgkit.net.embl.datawarehouse_search() to automatically set the limit at 100,000 records

• the taxonomy can now be saved using msgpack, making it faster to read/write it. It’s also more compact and better compression ratio

• the mgkit.plots.heatmap.grouped_spine() now accept the rotation of the labels as option

• added option to use another attribute for the gene_id in the get-gff-info script gtf command

• added a function to compare the version of MGKit used, throwing a warning, when it’s different (mgkit.check_version())

• removed test for old SNPs structures and added the same tests for the new one

• mgkit.snps.classes.GeneSNP now caches the number of synonymous and non-synonymous SNPs for better speed

• mgkit.io.gff.GenomicRange.__contains__() now also accepts a tuple (start, end) or another GenomicRange instance

## 0.2.2¶

### Removed¶

• deprecated code from the snps package

## 0.2.1¶

### Deprecated¶

• mgkit.filter.taxon.filter_taxonomy_by_lineage()
• mgkit.filter.taxon.filter_taxonomy_by_rank()

### Removed¶

• removed old filter_gff script