Skip to content

Commit

Permalink
ESM2 Finetuning refactor (#574)
Browse files Browse the repository at this point in the history
### Description
<!-- Provide a detailed description of the changes in this PR -->

### Type of changes
<!-- Mark the relevant option with an [x] -->

- [ ]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [x]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by checking relevant boxes below. This will
automatically apply labels.

- [ ]
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
- [ ]
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest

> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [ ] I have tested these changes locally
 - [ ] I have updated the documentation accordingly
 - [ ] I have added/updated tests as needed
 - [ ] All existing tests pass successfully

---------

Signed-off-by: Farhad Ramezanghorbani <[email protected]>
  • Loading branch information
farhadrgh authored Jan 15, 2025
1 parent 7960ada commit 81e0b24
Show file tree
Hide file tree
Showing 20 changed files with 2,493 additions and 951 deletions.
898 changes: 898 additions & 0 deletions docs/docs/user-guide/examples/bionemo-esm2/finetune.ipynb

Large diffs are not rendered by default.

263 changes: 0 additions & 263 deletions docs/docs/user-guide/examples/bionemo-esm2/finetune.md

This file was deleted.

43 changes: 36 additions & 7 deletions docs/docs/user-guide/examples/bionemo-esm2/inference.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -141,11 +141,40 @@
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Downloading data from 'nvidia/clara/esm2nv650m:2.0' to file '/home/ubuntu/.cache/bionemo/0798767e843e3d54315aef91934d28ae7d8e93c2849d5fcfbdf5fac242013997-esm2_650M_nemo2.tar.gz'.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\n",
" \"download_end\": \"2025-01-14 22:01:24\",\n",
" \"download_start\": \"2025-01-14 22:01:05\",\n",
" \"download_time\": \"18s\",\n",
" \"files_downloaded\": 1,\n",
" \"local_path\": \"/home/ubuntu/.cache/bionemo/tmpfj1e52vw/esm2nv650m_v2.0\",\n",
" \"size_downloaded\": \"1.12 GB\",\n",
" \"status\": \"COMPLETED\"\n",
"}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Untarring contents of '/home/ubuntu/.cache/bionemo/0798767e843e3d54315aef91934d28ae7d8e93c2849d5fcfbdf5fac242013997-esm2_650M_nemo2.tar.gz' to '/home/ubuntu/.cache/bionemo/0798767e843e3d54315aef91934d28ae7d8e93c2849d5fcfbdf5fac242013997-esm2_650M_nemo2.tar.gz.untar'\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"/home/bionemo/.cache/bionemo/0798767e843e3d54315aef91934d28ae7d8e93c2849d5fcfbdf5fac242013997-esm2_650M_nemo2.tar.gz.untar\n"
"/home/ubuntu/.cache/bionemo/0798767e843e3d54315aef91934d28ae7d8e93c2849d5fcfbdf5fac242013997-esm2_650M_nemo2.tar.gz.untar\n"
]
}
],
Expand All @@ -168,7 +197,7 @@
"metadata": {},
"source": [
"\n",
"We use the `InMemoryCSVDataset` class to load the protein sequence data from a `.csv` file. This data file should at least have a `sequences` column and can optionally have a `labels` column used for fine-tuning applications. Here is an example of how to create your own inference input data using a list of sequences in python:"
"We use the `InMemoryProteinDataset` class to load the protein sequence data from a `.csv` file. This data file should at least have a `sequences` column and can optionally have a `labels` column used for fine-tuning applications. Here is an example of how to create your own inference input data using a list of sequences in python:"
]
},
{
Expand Down Expand Up @@ -238,12 +267,12 @@
"name": "stdout",
"output_type": "stream",
"text": [
"2024-12-16 20:19:23 - faiss.loader - INFO - Loading faiss with AVX512 support.\n",
"2024-12-16 20:19:23 - faiss.loader - INFO - Successfully loaded faiss with AVX512 support.\n",
"[NeMo W 2024-12-16 20:19:24 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work\n",
"2025-01-14 22:01:45 - faiss.loader - INFO - Loading faiss with AVX512 support.\n",
"2025-01-14 22:01:45 - faiss.loader - INFO - Successfully loaded faiss with AVX512 support.\n",
"[NeMo W 2025-01-14 22:01:46 nemo_logging:405] /usr/local/lib/python3.12/dist-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work\n",
" warn(\"Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work\", RuntimeWarning)\n",
" \n",
"[NeMo W 2024-12-16 20:19:24 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pyannote/core/notebook.py:134: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed two minor releases later. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap(obj)`` instead.\n",
"[NeMo W 2025-01-14 22:01:46 nemo_logging:405] /usr/local/lib/python3.12/dist-packages/pyannote/core/notebook.py:134: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed in 3.11. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap()`` or ``pyplot.get_cmap()`` instead.\n",
" cm = get_cmap(\"Set1\")\n",
" \n",
"usage: infer_esm2 [-h] --checkpoint-path CHECKPOINT_PATH --data-path DATA_PATH\n",
Expand Down Expand Up @@ -533,7 +562,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
"version": "3.12.3"
}
},
"nbformat": 4,
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/user-guide/getting-started/development.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ of the model. The fine-tuning steps will be application-specific, but a general
6. **Run inference**: Once the model is fine-tuned, use it to make predictions on new, unseen data.

For more information on fine-tuning a model, refer to the [ESM-2 Fine-tuning
Tutorial](../examples/bionemo-esm2/finetune.md).
Tutorial](../examples/bionemo-esm2/finetune.ipynb).

## Advanced Developer Documentation

Expand Down
Loading

0 comments on commit 81e0b24

Please sign in to comment.