Skip to content

biodatageeks/polars-bio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

19a0a7c · Dec 16, 2024

History

11 Commits
Dec 16, 2024
Dec 16, 2024
Dec 16, 2024
Dec 16, 2024
Dec 16, 2024
Dec 16, 2024
Dec 12, 2024
Dec 16, 2024
Dec 16, 2024
Dec 16, 2024
Dec 16, 2024
Nov 26, 2024
Dec 16, 2024
Dec 16, 2024
Dec 16, 2024
Dec 16, 2024
Dec 16, 2024
Nov 26, 2024
Dec 12, 2024
Nov 26, 2024

Repository files navigation

polars_bio

Features

Genomic ranges operations

Features Bioframe polars-bio PyRanges Pybedtools PyGenomics GenomicRanges
overlap
nearest
cluster
merge
complement
select/slice
coverage
expand
sort

Input/Output

I/O Bioframe polars-bio PyRanges Pybedtools PyGenomics GenomicRanges
Pandas DataFrame
Polars DataFrame
Polars LazyFrame
Native readers

Genomic file format

I/O Bioframe polars-bio PyRanges Pybedtools PyGenomics GenomicRanges
BED
BAM
VCF

Performance

img.png

img.png

img.png

Remarks

Pyranges is multithreaded, but :

  • Requires Ray backend plus
  nb_cpu: int, default 1

            How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple.
            Will only lead to speedups on large datasets.
  • for nearest returns no empty rows if there is no overlap (we follow Bioframe where nulls are returned)