mgkit.io.fasta module¶

Simple fasta parser and a few utility functions

mgkit.io.fasta.load_fasta(file_handle)¶

Changed in version 0.1.13: now returns uppercase sequences

Loads a fasta file and returns a generator of tuples in which the first element is the name of the sequence and the second the sequence

Parameters:	file_handle (str, file) – fasta file to open; a file name or a file handle is expected
Yields:	tuple – first element is the sequence name/header, the second element is the sequence

mgkit.io.fasta.load_fasta_files(files)¶: New in version 0.3.4.

Loads all fasta files from a list or iterable

mgkit.io.fasta.load_fasta_prodigal(file_handle)¶

New in version 0.3.1.

Reads a Prodigal aminoacid fasta file and yields a dictionary with basic information about the sequences.

Parameters:	file_handle (str, file) – passed to `load_fasta()`
Yields:	dict – dictionary with the information contained in the header, the last of the attributes put into key attr, while the rest are transformed to other keys: seq_id, seq, start, end (genomic), strand, ordinal of

mgkit.io.fasta.load_fasta_rename(file_handle, name_func=None)¶: New in version 0.3.1.

Renames the header of the sequences using name_func, which is called on each header. By default, the behaviour is to keep the header to the left of the first space (BLAST behaviour).

mgkit.io.fasta.split_fasta_file(file_handle, name_mask, num_files)¶

New in version 0.1.13.

Splits a fasta file into a series of smaller files.

Parameters:	file_handle (file, str) – fasta file with the input sequences name_mask (str) – file name template for the splitted files, more informations are found in `mgkit.io.split_write()` num_files (int) – number of files in which to distribute the sequences

mgkit.io.fasta.write_fasta_sequence(file_handle, name, seq, wrap=60, write_mode='a')¶

Write a fasta sequence to file. If the file_handle is a string, the file will be opened using write_mode.

Parameters:	file_handle – file handle or string. name (str) – header to write for the sequence seq (str) – sequence to write wrap (int) – int for the line wrapping. If None, the sequence will be written in a single line