edit-gff - GFF Viewer and Editor

Overview

Script to edit GFF files

View GFF

Used to print the content of a GFF file as a table (more output formats will be added later).

The attributes printed are passed with -a, one attribute at a time. For example:

edit-gff view -a uid test.gff

will print uid for all annotations. Multiple attributes can be passed, like:

edit-gff view -a uid -a taxon_id test.gff

that will print a table with uid and taxon_id of each annotation.

The default behaviour is to print only annotations that have all the attributes requested. This can be changed by using the -k options and the fields that were not found are empty strings.

An header can be printed with the -h option.

Note

the order of the fields in the table is the same as the order of the attributes passed with -a

Change or Add Attributes

Add or changes annotations in a GFF files with the specified attributes.

The attributes and the values are passed with the -a option, for example to set all annotations taxon_id to 2, you can pass -a taxon_id 2. Multiple attributes can be set by passing multiple options. For example:

edit-gff add -a taxon_id 2 -a taxon_db CUSTOM test.gff

will set the taxon_id to 2 and the taxon_db to CUSTOM for all annotations.

The default behaviour is to not change an attribute already set in an annotation, but this can be changed by passing the -w option. Moreover, only edited annotations can be output with -o.

To change attributes on a subset of the annotations, a file can be passed with the -f options, which contains one uid per line. Only annotations that match a uid in that list are edited.

Remove Attributes

Removes a list of attributes in a GFF file. Only attributes in the last column of a GFF file (fields separated by a ‘;’) can be removed. Attributes are passed with the -a option followed by one attribute. Multiple -a attribute options can be passed.

To remove attributes on a subset of the annotations, a file can be passed with the -f options, which contains one uid per line. Only annotations that match a uid in that list are edited.

Table

Similar to the add command and with similar functions as add-gff-info addtaxa, it allows the adding/changing of attributes from a table file.

The user defines 2 attributes in a GFF annotation, the key and the attribute. The key is used to find if an annotation is to be modified and the attribute is set for that annotation with the value in the table. For example a table:

GENE001,1.1.3.3
GENE002,1.2.3.3

If key chosen is gene_id and attribute is EC, the GFF will be scanned for annotation that have the gene_id equal to GENE001 and set the attribute EC to 1.1.3.3 and similarly for the second row.

The table can have multiple fields, but only 2 can be loaded, the key and attribute in the options. The 2 fields are loaded into a python dictionary, with the key and attribute being respectively the key and value in it. So 2 things must be noted:

  1. duplicates keys will be overwritten (only last one remains)
  2. the entire fields are first loaded, which can take up a lot of RAM

The default is for the key to be the first field (0) and the attribute is the second (1). The table may contains some headers, so the first N rows can be skipped with -r. Also, the field separator can be chosen, as well as only the edited annotation be printed (-o option).

Changes

New in version 0.4.4.

Options

edit-gff

Main function

edit-gff [OPTIONS] COMMAND [ARGS]...

Options

--version

Show the version and exit.

--cite

add

Add fields to a GFF File

edit-gff add [OPTIONS] [INPUT_FILE] [OUTPUT_FILE]

Options

-v, --verbose
-a, --attributes <attributes>

Required Add attributes to the GFF file. For example -a taxon_id 2 will add taxon_id attribute with a value of 2 to all annotations. Multiple attributes can be set, for example: -a taxon_id 2 -a gene_id TEST

-w, --overwrite

Overwrite the attributes if present

-o, --only-edited

Only output edited annotations

-f, --uids <uids>

Only edit annotations with uid in a file (one per line)

Arguments

INPUT_FILE

Optional argument

OUTPUT_FILE

Optional argument

fields

Prints the fields in a GFF File

edit-gff fields [OPTIONS] [GFF_FILE] [TXT_FILE]

Options

-v, --verbose
-n, --num-ann <num_ann>

Number of annotations to parse, 0 will parse the whole file

Default:10

Arguments

GFF_FILE

Optional argument

TXT_FILE

Optional argument

remove

Remove fields from a GFF File

edit-gff remove [OPTIONS] [INPUT_FILE] [OUTPUT_FILE]

Options

-v, --verbose
-a, --attributes <attributes>

Required Remove attributes to the GFF file. For example -a taxon_id will remove taxon_id attribute for all annotations. Multiple attributes can be removed, for example: -a taxon_id -a gene_id

-f, --uids <uids>

Only edit annotations with uid in a file (one per line)

Arguments

INPUT_FILE

Optional argument

OUTPUT_FILE

Optional argument

table

Adds fields from a Table file

edit-gff table [OPTIONS] [INPUT_FILE] [OUTPUT_FILE]

Options

-v, --verbose
-k, --key <key>

Attribute used to search the table defaults to uid

-a, --attribute <attribute>

Required Attribute to add/change

-o, --only-edited

Only output edited annotations

-r, --skip-rows <skip_rows>

Number of rows to skip at the start of the file

-s, --separator <separator>

Fields separator, default to TAB

-t, --table-file <table_file>

Required

-ki, --key-index <key_index>

Which field in the table is the key value

Default:0
-ai, --attr-index <attr_index>

Which field in the table is the attribute value

Default:1

Arguments

INPUT_FILE

Optional argument

OUTPUT_FILE

Optional argument

view

View GFF file as table/json, etc.

edit-gff view [OPTIONS] [INPUT_FILE] [OUTPUT_FILE]

Options

-v, --verbose
-h, --header

Print Header

-k, --keep-empty

Keep annotations where not all attributes were found

-a, --attributes <attributes>

Required Attributes of GFF file to print. For example -a taxon_id will print taxon_id for all annotations. Multiple attributes can be printed, for example: -a taxon_id -a gene_id

-s, --separator <separator>

Fields separator, default to TAB

Arguments

INPUT_FILE

Optional argument

OUTPUT_FILE

Optional argument