pyCoverM is a Python library that provides bindings to CoverM, enabling fast coverage estimation.
pyCoverM is available via PyPI or Conda.
pip install pycovermThe Conda package can be installed though Pixi or Mamba/Conda.
# Pixi
pixi init pycoverm_project
cd pycoverm_project
pixi project channel add bioconda
pixi add pycoverm
# Mamba (just replace 'mamba' with 'conda' if you have Conda installed)
mamba create -n pycoverm_env -c conda-forge -c bioconda pycoverm
mamba activate pycoverm_envpyCoverM provides two functions:
is_bam_sorted: Checks if a BAM file is sorted by coordinate and returnsTrueif sorted, orFalseotherwise.get_coverages_from_bam: Computes the average contig coverage from sorted BAM files. It returns a tuple where the first element is a list of contig names, and the second is a NumPy array of coverage values.
>>> import pycoverm
>>> TEST_BAM = "tests/test_data.bam"
>>> pycoverm.is_bam_sorted(TEST_BAM)True
>>> coverages = pycoverm.get_coverages_from_bam([TEST_BAM])
>>> coverages[0]['contig_7847997', 'contig_11746202', 'contig_9129108', …, 'contig_2917594']
>>> coverages[1]array([[0. ],
[0.526652 ],
[0.08541025],
… ,
[0.00907206]], dtype=float32)
Note
If multiple BAM files are provided, the resulting NumPy array will contain one column for each BAM file, with each column corresponding to the coverage values from a specific BAM file.
/// is_bam_sorted(bam_file)
/// --
///
/// Checks whether a BAM file is sorted by coordinate.
///
/// Parameters
/// ----------
/// bam_file : str
/// Path to a BAM file.
///
/// Returns
/// -------
/// bool
/// Returns `True` if the BAM file is sorted by coordinate and `False`
/// otherwise.
/// get_coverages_from_bam(bam_list, contig_end_exclusion=75, min_identity=0.97,
/// trim_lower=0.0, trim_upper=0.0, contig_list=None, threads=1)
/// --
///
/// Computes contig mean coverages from sorted BAM files. All BAM files must be
/// mapped to the same reference.
/// Trimmed means will be computed if `trim_min` and/or `trim_max` are set to
/// values greater than 0.
///
/// Parameters
/// ----------
/// bam_list : list
/// A list of paths to input BAM files.
/// contig_end_exclusion : int, optional
/// Exclude bases at the ends of reference sequences from calculation.
/// Default is 75.
/// min_identity : float, optional
/// Exclude reads by overall identity to the reference sequences.
/// Default is 0.97.
/// trim_lower : float, optional
/// Fraction to trim from the lower tail of the coverage distribution.
/// Default is 0.0.
/// trim_upper : float, optional
/// Fraction to trim from the upper tail of the coverage distribution.
/// Default is 0.0.
/// contig_set : set, optional
/// If provided, only the coverages of the contigs within `contig_set` will
/// returned.
/// Default is None (return the coverages of all contigs).
/// threads : int, optional
/// Number of threads to use for coverage computation. Default is 1.
///
/// Returns
/// -------
/// tuple
/// A tuple whose fist element is a list of the contig names and the second
/// one is a numpy matrix of contig coverages in the input BAM files.