mgkit.io.fasta module

Simple fasta parser and a few utility functions

mgkit.io.fasta.load_fasta(file_handle)[source]

Changed in version 0.1.13: now returns uppercase sequences

Loads a fasta file and returns a generator of tuples in which the first element is the name of the sequence and the second the sequence

Parameters:file_handle (str, file) – fasta file to open; a file name or a file handle is expected
Yields:tuple – first element is the sequence name/header, the second element is the sequence
mgkit.io.fasta.load_fasta_files(files)[source]

New in version 0.3.4.

Loads all fasta files from a list or iterable

mgkit.io.fasta.load_fasta_prodigal(file_handle)[source]

New in version 0.3.1.

Reads a Prodigal aminoacid fasta file and yields a dictionary with basic information about the sequences.

Parameters:file_handle (str, file) – passed to load_fasta()
Yields:dict – dictionary with the information contained in the header, the last of the attributes put into key attr, while the rest are transformed to other keys: seq_id, seq, start, end (genomic), strand, ordinal of
mgkit.io.fasta.load_fasta_rename(file_handle, name_func=None)[source]

New in version 0.3.1.

Renames the header of the sequences using name_func, which is called on each header. By default, the behaviour is to keep the header to the left of the first space (BLAST behaviour).

mgkit.io.fasta.split_fasta_file(file_handle, name_mask, num_files)[source]

New in version 0.1.13.

Splits a fasta file into a series of smaller files.

Parameters:
  • file_handle (file, str) – fasta file with the input sequences
  • name_mask (str) – file name template for the splitted files, more informations are found in mgkit.io.split_write()
  • num_files (int) – number of files in which to distribute the sequences
mgkit.io.fasta.write_fasta_sequence(file_handle, name, seq, wrap=60, write_mode='a')[source]

Write a fasta sequence to file. If the file_handle is a string, the file will be opened using write_mode.

Parameters:
  • file_handle – file handle or string.
  • name (str) – header to write for the sequence
  • seq (str) – sequence to write
  • wrap (int) – int for the line wrapping. If None, the sequence will be written in a single line