hmmer2gff - Convert HMMER output to GFF¶

Overview¶

Script to convert HMMER results files (domain table) to a GFF file, the name of the profiles are expected to be now in the form GENEID_TAXONID_TAXON-NAME(-nr) by default, but any other profile name is accepted.

The profiles tested are those made from Kegg Orthologs, from the download_profiles script. If the –no-custom-profiles options is used, the script can be used with any profile name. The profile name will be used for gene_id, taxon_id and taxon_name in the GFF file.

It is possible to use seuqnces not translated using mgkit, no information on the frame is assumed, so this script can be used against a protein DB. For example Uniprot can be searched for profiles, in which case the –no-frame options must be used.

Note

for GENEID, old documentation points to KOID, it is the same

Warning

The compatibility with old data has been removed, meaning that old experiments must use the scripts from those versions. It is possible to use multiple environments, with virtualenv for this purpose. An examples is given in Installation.

Changes¶

Changed in version 0.1.15: adapted to new GFF module and specs

Changed in version 0.2.1: added options to customise output and filters and old restrictions

Changed in version 0.3.1: added –no-frame option for non mgkit-translated proteins, sequence headers are handled the same way as HMMER (truncated at the first space)

Options¶

Convert HMMER data to GFF file

usage: hmmer2gff [-h] [-o [OUTPUT_FILE]] [-t DISCARD] [-d] [-c] [-db DATABASE]
                 [-f FEATURE_TYPE] [-n] [-v | --quiet] [--cite] [--manual]
                 [--version]
                 aa_file [hmmer_file]

Named Arguments¶

`-v, --verbose`	more verbose - includes debug messages Default: 20
`--quiet`	less verbose - only error and critical messages
`--cite`	Show citation for the framework
`--manual`	Show the script manual
`--version`	show program’s version number and exit

File options¶

`aa_file`	Fasta file containing contigs translated to aa (used by HMMER)
`hmmer_file`	Default: -
`-o, --output-file`
	Default: <open file ‘<stdout>’, mode ‘w’ at 0x7fcb03223150>

Filters¶

-t, --discard

Evalue over which an hit will be discarded

Default: 0.05

-d, --disable-evalue

Disable Evalue filter

Default: False

GFF¶

`-c, --no-custom-profiles`
	Profiles names are not in the custom format Default: True
`-db, --database`
	Database from which the profiles are generated ” +” (e.g. PFAM) Default: “CUSTOM”
`-f, --feature-type`
	Type of feature (e.g. gene) Default: “gene”
`-n, --no-frame`	Set if the sequences were not translated with translate_seq Default: False