mgkit.align module¶
Module dealing with BAM/SAM files
-
class
mgkit.align.SamtoolsDepth(file_handle, num_seqs=10000)¶ Bases:
future.types.newobject.newobjectNew in version 0.3.0.
A class used to cache the results of
read_samtools_depth(), while reading only the necessary data from a`samtools depth -aa` file.-
data= None¶
-
file_handle= None¶
-
region_coverage(seq_id, start, end)¶ Returns the mean coverage of a region. The start and end parameters are expected to be 1-based coordinates, like the correspondent attributes in
mgkit.io.gff.Annotationormgkit.io.gff.GenomicRange.If the sequence for which the coverage is requested is not found, the depth file is read (and cached) until it is found.
Parameters: Returns: mean coverage of the requested region
Return type:
-
-
mgkit.align.add_coverage_info(annotations, bam_files, samples, attr_suff='_cov')¶ Changed in version 0.3.4: the coverage now is returned as floats instead of int
Adds coverage information to annotations, using BAM files.
The coverage information is added for each sample as a ‘sample_cov’ and the total coverage as as ‘cov’ attribute in the annotations.
Note
The bam_files and sample variables must have the same order
Parameters: - annotations (iterable) – iterable of annotations
- bam_files (iterable) – iterable of
pysam.Samfileinstances - sample (iterable) – names of the samples for the BAM files
-
mgkit.align.covered_annotation_bp(files, annotations, min_cov=1, progress=False)¶ New in version 0.1.14.
Returns the number of base pairs covered of annotations over multiple samples.
Parameters: Returns: a dictionary whose keys are the uid and the values the number of bases that are covered by reads among all samples
Return type:
-
mgkit.align.get_region_coverage(bam_file, seq_id, feat_from, feat_to)¶ Return coverage for an annotation.
Note
feat_from and feat_to are 1-based indexes
Parameters: Return int: coverage array for the annotation
-
mgkit.align.read_samtools_depth(file_handle, num_seqs=10000)¶ - ..versionchanged:: 0.3.4
- num_seqs can be None to avoid a log message
New in version 0.3.0.
Reads a samtools depth file, returning a generator that yields the array of each base coverage on a per-sequence base.
Note
The information on position is not used, to use numpy and save memory. samtools depth should be called with the -aa option:
`samtools depth -aa bamfile`
This options will output both base position with 0 coverage and sequneces with no aligned reads
Parameters: Yields: tuple – the first element is the sequence identifier and the second one is the numpy array with the positions