mgkit.align module¶
Module dealing with BAM/SAM files
-
class
mgkit.align.
SamtoolsDepth
(file_handle, num_seqs=10000)¶ Bases:
object
New in version 0.3.0.
A class used to cache the results of
read_samtools_depth()
, while reading only the necessary data from a`samtools depth -aa` file.-
data
= None¶
-
file_handle
= None¶
-
region_coverage
(seq_id, start, end)¶ Returns the mean coverage of a region. The start and end parameters are expected to be 1-based coordinates, like the correspondent attributes in
mgkit.io.gff.Annotation
ormgkit.io.gff.GenomicRange
.If the sequence for which the coverage is requested is not found, the depth file is read (and cached) until it is found.
Parameters: Returns: mean coverage of the requested region
Return type:
-
-
mgkit.align.
add_coverage_info
(annotations, bam_files, samples, attr_suff='_cov')¶ Adds coverage information to annotations, using BAM files.
The coverage information is added for each sample as a ‘sample_cov’ and the total coverage as as ‘cov’ attribute in the annotations.
Note
The bam_files and sample variables must have the same order
Parameters: - annotations (iterable) – iterable of annotations
- bam_files (iterable) – iterable of
pysam.Samfile
instances - sample (iterable) – names of the samples for the BAM files
-
mgkit.align.
covered_annotation_bp
(files, annotations, min_cov=1, progress=False)¶ New in version 0.1.14.
Returns the number of base pairs covered of annotations over multiple samples.
Parameters: Returns: a dictionary whose keys are the uid and the values the number of bases that are covered by reads among all samples
Return type:
-
mgkit.align.
get_region_coverage
(bam_file, seq_id, feat_from, feat_to)¶ Return coverage for an annotation.
Note
feat_from and feat_to are 1-based indexes
Parameters: Return int: coverage array for the annotation
-
mgkit.align.
read_samtools_depth
(file_handle, num_seqs=10000)¶ New in version 0.3.0.
Reads a samtools depth file, returning a generator that yields the array of each base coverage on a per-sequence base.
Note
The information on position is not used, to use numpy and save memory. samtools depth should be called with the -aa option:
`samtools depth -aa bamfile`
This options will output both base position with 0 coverage and sequneces with no aligned reads
Parameters: - file_handle (file) – file handle of the coverage file
- num_seqs (int) – number of sequence that fires a log message
Yields: tuple – the first element is the sequence identifier and the second one is the numpy array with the positions