Skip to content

Commit b34da81

Browse files
committed
Update README with better explanation about bfloat16 usage in cupy
1 parent bda1298 commit b34da81

File tree

2 files changed

+59
-7
lines changed

2 files changed

+59
-7
lines changed

README.md

Lines changed: 56 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,70 @@
11
# :lollipop: Epigenetics Dataloader for BigWig files
22

3+
[![Tests](https://github.com/pfizer-opensource/bigwig-loader/actions/workflows/tests.yml/badge.svg)](https://github.com/pfizer-opensource/bigwig-loader/actions/workflows/tests.yml)
4+
[![Code Quality](https://github.com/pfizer-opensource/bigwig-loader/actions/workflows/run-commit-hooks.yml/badge.svg)](https://github.com/pfizer-opensource/bigwig-loader/actions/workflows/run-commit-hooks.yml)
5+
36
Fast batched dataloading of BigWig files containing epigentic track data and corresponding sequences powered by GPU
47
for deep learning applications.
58

69
> ⚠️ **BREAKING CHANGE (v0.3.0+)**: The output matrix dimensionality has changed from `(n_tracks, batch_size, sequence_length)` to `(batch_size, sequence_length, n_tracks)`. This change was long overdue and eliminates the need for (potentially memory expensive) transpose operations downstream. If you're upgrading from an earlier version, please update your code accordingly (probaby you need to delete one transpose in your code).
710
811
> **NEW FEATURE (v0.3.0+)**: Full `bfloat16` support! You can now specify `dtype="bfloat16"` to get output tensors in bfloat16 format, reducing memory usage by 50%.
912
13+
> ⚠️ **Cupy and bfloat16 support**
14+
Because cupy does not support bfloat16 yet, the cupy array is typed as uint64, but the actual data behind it is in bfloat16. So when converting the array to a tensor in a framework that DOES support bfloat16 like pytorch, tensorflow or JAX should be followed by a "view" method that just changes how the underlying bytes are interpreted (and not actually casting to bfloat16, which would change the underlaying data). In the *bigwig_loader.pytorch.PytorchBigWigDataset* this has already been done for you (when you set dtype="bfloat16").
15+
1016

1117

1218

1319
## Quickstart
1420

1521
### Installation with Pixi
1622
Using [pixi](https://pixi.sh/) to install bigwig-loader is highly recommended.
17-
Please take a look at the pixi.toml file. If you just want to use bigwig-loader, just
18-
copy that pixi.toml, add the other libraries you need and use the "prod" environment
23+
Please take a look at this example pixi.toml:
24+
25+
```toml
26+
[workspace]
27+
channels = ["rapidsai", "conda-forge", "nvidia", "bioconda", "dataloading"]
28+
name = "bigwig-loader"
29+
platforms = ["linux-64"]
30+
version = "0.1.0"
31+
32+
[tasks]
33+
download-example-data = { cmd = "python -m bigwig_loader.download_example_data"}
34+
35+
[feature.bigwig-loader.system-requirements]
36+
cuda = "12"
37+
38+
[dependencies]
39+
python = "==3.11"
40+
pip = "*"
41+
42+
[feature.bigwig-loader.dependencies]
43+
cuda-version = "12.8.*"
44+
pytorch-gpu = ">=2.6"
45+
cuda-nvcc = "*"
46+
kvikio = "<=25.08.00"
47+
bigwig-loader = "*"
48+
numpy = "*"
49+
pandas = "*"
50+
51+
[pypi-dependencies]
52+
python-dotenv = "*"
53+
pydantic = "*"
54+
pydantic-settings = "*"
55+
universal-pathlib = "*"
56+
fsspec = { version = "*" }
57+
s3fs = "*"
58+
pyfaidx = "*"
59+
numcodecs ="*"
60+
61+
[environments]
62+
default = {features = ["bigwig-loader"]}
63+
```
64+
65+
66+
If you just want to use bigwig-loader, just
67+
copy that into a pixi.toml file and add the other libraries you need.
1968
(you don't need to clone this repo, pixi will download bigwig-loader from the
2069
conda "dataloading" channel):
2170

@@ -26,14 +75,16 @@ conda "dataloading" channel):
2675

2776
* change directory to wherever you put the pixi.toml, and:
2877
```shell
29-
pixi run -e prod <my_training_command>
78+
pixi run <my_training_command>
3079
```
3180

3281

82+
The pixi.toml I included in this repository works for both the released version and for development of bigwig-loader, but assumes you cloned this repo.
83+
84+
3385
### Installation with conda/mamba
3486

35-
Bigwig-loader mainly depends on the rapidsai kvikio library and cupy, both of which are best installed using
36-
conda/mamba. Bigwig-loader can now also be installed using conda/mamba. To create a new environment with bigwig-loader
87+
Alternatively, bigwig-loader can be installed using conda/mamba. To create a new environment with bigwig-loader
3788
installed:
3889

3990
```shell

pixi.toml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[workspace]
22
channels = ["rapidsai", "conda-forge", "nvidia", "bioconda", "dataloading"]
3-
name = "nucleotides-diffusion"
3+
name = "bigwig-loader"
44
platforms = ["linux-64", "osx-arm64"]
55
version = "0.1.0"
66
conda-pypi-map = { "conda-forge" = "conda-forge-pypi-map.json" }
@@ -60,6 +60,7 @@ asgi-lifespan = "*"
6060
pyBigWig = "*"
6161

6262
[environments]
63-
prod = {features = ["gpu", "released"]}
63+
default = {features = ["gpu", "released"]}
64+
test-released = {features = ["gpu", "released", "test"]}
6465
dev = {features = ["gpu", "dev", "test"]}
6566
dev-cpu = {features = ["cpu", "test"] }

0 commit comments

Comments
 (0)