diff --git a/docs/docs/user-guide/examples/bionemo-esm2/finetune.ipynb b/docs/docs/user-guide/examples/bionemo-esm2/finetune.ipynb
deleted file mode 100644
index ef1088b9f1..0000000000
--- a/docs/docs/user-guide/examples/bionemo-esm2/finetune.ipynb
+++ /dev/null
@@ -1,898 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "[![ Click here to deploy.](https://uohmivykqgnnbiouffke.supabase.co/storage/v1/object/public/landingpage/brevdeploynavy.svg)](https://console.brev.dev/launchable/deploy?launchableID=env-2rPWpPzzJIxq7SMRJIQehCxBymV)\n",
-    "\n",
-    "<div class=\"alert alert-block alert-info\"> <b>NOTE</b> It takes about 10 minutes to deploy this notebook as a Launchable. As of this writing, we are working on a free tier so a credit card may be required. You can reach out to your NVIDIA rep for credits. </div>"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# ESM-2 Fine-tuning"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "vscode": {
-     "languageId": "plaintext"
-    }
-   },
-   "source": [
-    "The [ESM-2](https://www.science.org/doi/abs/10.1126/science.ade2574) model is a transformer-based protein language model that has achieved state-of-the-art results in various protein-related tasks. When fine-tuning ESM2, the task-head plays a crucial role. A task head refers to the additional layer or set of layers added on top of a pre-trained model, like the ESM-2 transformer-based protein language model, to adapt it for a specific downstream task. As a part of transfer learning, a pre-trained model is often utilized to learn generic features from a large-scale dataset. However, these features might not be directly applicable to the specific task at hand. By incorporating a task head, which consists of learnable parameters, the model can adapt and specialize to the target task. The task head serves as a flexible and adaptable component that learns task-specific representations by leveraging the pre-trained features as a foundation. Through fine-tuning, the task head enables the model to learn and extract task-specific patterns, improving performance and addressing the nuances of the downstream task. It acts as a critical bridge between the pre-trained model and the specific task, enabling efficient and effective transfer of knowledge."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<div class=\"alert alert-block alert-info\"> <b>NOTE</b> This tutorial will guide you through the steps for creating a basic regression fine-tuning task for simplicity. The utilities described in this tutorial are available in:\n",
-    "\n",
-    "<pre>bionemo.esm2.model.finetune.finetune_regressor</pre> \n",
-    "\n",
-    "The techniques demonstrated here can be adapted for classification and per-token classification tasks. Utilities needed for secondary structure prediction (token-level classification) are available in \n",
-    "\n",
-    "<pre>bionemo.esm2.model.finetune.finetune_token_classifier</pre> \n",
-    "\n",
-    "In the second part of the tutorial, we will cover loading a pre-trained model, fine-tuning it for both regression and per-token classification tasks, and using the fine-tuned models for inference. For instructions on pre-training the ESM-2 model, please refer to the [ESM-2 Pretraining](./pretrain.md) tutorial.</div>"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Building a Regression Fine-tune Module"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Wwe need to define some key classes to successfully build a fine-tuning module in BioNeMo framework: \n",
-    "\n",
-    "1. **Loss Reduction Class** - To compute the supervised fine-tuning loss.\n",
-    "2. **Fine-Tuned Model Head** - Downstream task head model.\n",
-    "3. **Fine-Tuned Model** - Model that combines ESM-2 with the task head model.\n",
-    "4. **Fine-Tuning Config** - Configures the fine-tuning model and loss to use in the training and inference framework.\n",
-    "5. **Dataset** - Training and inference datasets for ESM2 fine-tuning."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### 1 - Loss Reduction Class"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "A class for calculating the supervised loss of the fine-tune model from targets. We inherit from Megatron Bert Masked Language Model Loss (`BERTMLMLossWithReduction`) and override the `forward()` pass to compute MSE loss of the regression head within a micro-batch. The `reduce()` method is used for computing the average over the micro-batches and is only used for logging.\n",
-    "\n",
-    "```python\n",
-    "class RegressorLossReduction(BERTMLMLossWithReduction):\n",
-    "    def forward(\n",
-    "        self, batch: Dict[str, torch.Tensor], forward_out: Dict[str, torch.Tensor]\n",
-    "    ) -> Tuple[torch.Tensor, Union[PerTokenLossDict, SameSizeLossDict]]:\n",
-    "\n",
-    "        regression_output = forward_out[\"regression_output\"]\n",
-    "        targets = batch[\"labels\"].to(dtype=regression_output.dtype)  # [b, 1]\n",
-    "\n",
-    "        loss = torch.nn.functional.mse_loss(regression_output, targets)\n",
-    "        return loss, {\"avg\": loss}\n",
-    "\n",
-    "    def reduce(self, losses_reduced_per_micro_batch: Sequence[ReductionT]) -> torch.Tensor:\n",
-    "        losses = torch.stack([loss[\"avg\"] for loss in losses_reduced_per_micro_batch])\n",
-    "        return losses.mean()\n",
-    "```"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### 2 - Fine-Tuned Model Head"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "An MLP class for sequence-level regression. This class inherits `MegatronModule` and uses the fine-tune config (`TransformerConfig`) to configure the regression head for the fine-tuned ESM-2 model.\n",
-    "\n",
-    "```python\n",
-    "class MegatronMLPHead(MegatronModule):\n",
-    "    def __init__(self, config: TransformerConfig):\n",
-    "        super().__init__(config)\n",
-    "        layer_sizes = [config.hidden_size, 256, 1]\n",
-    "        self.linear_layers = torch.nn.ModuleList(\n",
-    "            [torch.nn.Linear(i, o) for i, o in zip(layer_sizes[:-1], layer_sizes[1:])]\n",
-    "        )\n",
-    "        self.act = torch.nn.ReLU()\n",
-    "        self.dropout = torch.nn.Dropout(p=config.ft_dropout)\n",
-    "\n",
-    "    def forward(self, hidden_states: torch.Tensor) -> List[torch.Tensor]:\n",
-    "        ...\n",
-    "```"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### 3 - Fine-Tuned Model"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "A fine-tuned ESM-2 model class for token classification tasks. This class inherits from the `ESM2Model` class and adds the custom regression head `MegatronMLPHead` the we created in the previous step. Optionally one can freeze all or parts of the encoder by parsing through the model parameters in the model constructor.\n",
-    "\n",
-    "```python\n",
-    "class ESM2FineTuneSeqModel(ESM2Model):\n",
-    "    def __init__(self, config, *args, post_process: bool = True, include_embeddings: bool = False, **kwargs):\n",
-    "        super().__init__(config, *args, post_process=post_process, include_embeddings=True, **kwargs)\n",
-    "\n",
-    "        # freeze encoder parameters\n",
-    "        if config.encoder_frozen:\n",
-    "            for _, param in self.named_parameters():\n",
-    "                param.requires_grad = False\n",
-    "\n",
-    "        if post_process:\n",
-    "            self.regression_head = MegatronMLPHead(config)\n",
-    "\n",
-    "    def forward(self, *args, **kwargs,):\n",
-    "        output = super().forward(*args, **kwargs)\n",
-    "        ...\n",
-    "        output[\"regression_output\"] = self.regression_head(embeddings)\n",
-    "        return output\n",
-    "```"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### 4 - Fine-Tuning Config"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "A `dataclass` that configures the fine-tuned ESM-2 model. In this example `ESM2FineTuneSeqConfig` inherits from `ESM2GenericConfig` and adds custom arguments to setup the fine-tuned model. The `configure_model()` method of this `dataclass` is called within the `Lightning` module to call the model constructor with the `dataclass` arguments.\n",
-    "\n",
-    "The common arguments among different fine-tuning tasks are\n",
-    "\n",
-    "- `model_cls`: The fine-tune model class defined in previous step (`ESM2FineTuneSeqModel`)\n",
-    "- `initial_ckpt_path`: BioNeMo 2.0 ESM-2 pre-trained checkpoint\n",
-    "- `initial_ckpt_skip_keys_with_these_prefixes`: skips keys when loading parameters from a checkpoint. For example, we should not look for `regression_head` in the pre-trained checkpoint.\n",
-    "- `get_loss_reduction_class()`: Implements selection of the appropriate `MegatronLossReduction` class that we defined in the first step of this tutorial.\n",
-    "\n",
-    "```python\n",
-    "\n",
-    "@dataclass\n",
-    "class ESM2FineTuneSeqConfig(\n",
-    "    ESM2GenericConfig[ESM2FineTuneSeqModel, RegressorLossReduction], iom.IOMixinWithGettersSetters\n",
-    "):\n",
-    "    model_cls: Type[ESM2FineTuneSeqModel] = ESM2FineTuneSeqModel\n",
-    "    # The following checkpoint path is for nemo2 checkpoints. Config parameters not present in\n",
-    "    # self.override_parent_fields will be loaded from the checkpoint and override those values here.\n",
-    "    initial_ckpt_path: str | None = None\n",
-    "    # typical case is fine-tune the base biobert that doesn't have this head. If you are instead loading a checkpoint\n",
-    "    # that has this new head and want to keep using these weights, please drop this next line or set to []\n",
-    "    initial_ckpt_skip_keys_with_these_prefixes: List[str] = field(default_factory=lambda: [\"regression_head\"])\n",
-    "\n",
-    "    encoder_frozen: bool = True  # freeze encoder parameters\n",
-    "    ft_dropout: float = 0.25  # MLP layer dropout\n",
-    "\n",
-    "    def get_loss_reduction_class(self) -> Type[MegatronLossReduction]:\n",
-    "        return RegressorLossReduction\n",
-    "```"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### 5 - Dataset"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "\n",
-    "We will use a sample dataset for demonstration purposes. Create a dataset class by extending ```bionemo.esm2.model.finetune.dataset.InMemoryProteinDataset```. The `InMemoryProteinDataset` has a `classmethod` (`from_csv`) that reads data from a CSV file that has `sequences` and optionally `labels` columns. It is important to override the `transform_label()` method that returns a `torch.Tensor` containing the label in correct format. As an example we can use this method to add custom tokenization if `label` is a string."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "\n",
-    "The custom dataset class will be appropriate (found in ```bionemo.esm2.model.finetune.dataset.InMemorySingleValueDataset```) as it facilitates predicting on a single value. An excerpt from the class is shown below. This example dataset has a class method `from_csv()` that expects a `data_path` to a CSV file that has `sequences`, and `labels` columns.\n",
-    "\n",
-    "```python\n",
-    "class InMemorySingleValueDataset(InMemoryProteinDataset):\n",
-    "    def __init__(\n",
-    "        self,\n",
-    "        sequences: pd.Series,\n",
-    "        labels: pd.Series | None = None,\n",
-    "        tokenizer: tokenizer.BioNeMoESMTokenizer = tokenizer.get_tokenizer(),\n",
-    "        seed: int = np.random.SeedSequence().entropy,\n",
-    "    ):\n",
-    "        super().__init__(sequences, labels, tokenizer, seed)\n",
-    "\n",
-    "    def transform_label(self, label: float) -> Tensor:\n",
-    "        return torch.tensor([label], dtype=torch.float)\n",
-    "```\n",
-    "\n",
-    "The `transform_label` method allows for custom transformation of raw labels by casting or tokenization and need to be adjusted based on the data. Here we use this method to create a `float` tensor of the regression value."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### DataModule"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "To coordinate the creation of training, validation and testing datasets from your data, we need to use a `datamodule` class. To do this we can directly use or extend the ```ESM2FineTuneDataModule``` class (located at ```bionemo.esm2.model.finetune.datamodule.ESM2FineTuneDataModule```) which defines helpful abstract methods that use your dataset class.\n",
-    "\n",
-    "```python\n",
-    "dataset = InMemorySingleValueDataset.from_csv(data_path)\n",
-    "data_module = ESM2FineTuneDataModule(\n",
-    "    train_dataset=dataset,\n",
-    "    valid_dataset=dataset\n",
-    "    micro_batch_size=4,   # size of a batch to be processed in a device\n",
-    "    global_batch_size=8,  # size of batch across all devices. Should be multiple of micro_batch_size\n",
-    ")\n",
-    "```"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "In the next part of this tutorial we will prepare the input needed to run regression and per-token-classification fine-tuning examples."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Setup and Assumptions"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "All commands should be executed inside the BioNeMo docker container, which has all ESM-2 dependencies pre-installed. For more information on how to build or pull the BioNeMo2 container, refer to the [Initialization Guide](../../getting-started/initialization-guide.md)."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<div class=\"alert alert-block alert-info\"> <b>NOTE</b> Some of the cells below generate long text output. We're using <pre>%%capture --no-display --no-stderr cell_output</pre> to suppress this output. Comment or delete this line in the cells below to restore full output.</div>"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Import Required Libraries"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%capture --no-display --no-stderr cell_output\n",
-    "\n",
-    "import os\n",
-    "import shutil\n",
-    "import pandas as pd\n",
-    "\n",
-    "import warnings\n",
-    "warnings.filterwarnings('ignore')\n",
-    "warnings.simplefilter('ignore')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Work Directory"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Set the work directory to store data and results:\n",
-    "\n",
-    "<div class=\"alert alert-block alert-info\"> <b>NOTE</b> We set the following to clean up the work directory created by this notebook  <pre>cleanup : bool = True</pre></div>"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "cleanup : bool = True"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Directory '/workspace/bionemo2/esm2_finetune_tutorial' created.\n"
-     ]
-    }
-   ],
-   "source": [
-    "work_dir=\"/workspace/bionemo2/esm2_finetune_tutorial\"\n",
-    "\n",
-    "if cleanup and os.path.exists(work_dir):\n",
-    "    shutil.rmtree(work_dir)\n",
-    "\n",
-    "if not os.path.exists(work_dir):\n",
-    "    os.makedirs(work_dir)\n",
-    "    print(f\"Directory '{work_dir}' created.\")\n",
-    "else:\n",
-    "    print(f\"Directory '{work_dir}' already exists.\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Download Pre-trained Model Checkpoints"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The following code will download the internally pre-trained model, `esm2/8m:2.0`, from the NGC registry. Please refer to [ESM-2 Model Overview](../../../models/ESM-2/index.md) for a list of available checkpoints."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "/home/ubuntu/.cache/bionemo/b4ea4d52eea8a25d2c2838617ff678f0da22d384cee195b0c192686816078dcd-esm2_8m_checkpoint.tar.gz.untar\n"
-     ]
-    }
-   ],
-   "source": [
-    "from bionemo.core.data.load import load\n",
-    "\n",
-    "checkpoint_path = load(\"esm2/8m:2.0\")\n",
-    "print(checkpoint_path)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The above example is downloading an internally trained 8M ESM-2 model. The pre-trained checkpoints can be downloaded from NGC resources using either the following bash command or the `load` function in `bionemo.core.data.load` as shown above.\n",
-    "\n",
-    "```bash\n",
-    "download_bionemo_data esm2/650m:2.0\n",
-    "```\n",
-    "\n",
-    "which returns the checkpoint path (e.g. `.../.cache/bionemo/975d29ee980fcb08c97401bbdfdcf8ce-esm2_650M_nemo2.tar.gz.untar`)\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Fine-tuning"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We can take advantage of the ESM2 fine-tuning script in ```bionemo.esm2.scripts.finetune_esm2``` or use the ```finetune_esm2``` executable the fine-tuning process given:\n",
-    "\n",
-    "- Pre-trained checkpoint of ESM2\n",
-    "- Finetune config class name that configures the finetune model and loss reduction\n",
-    "- Path to train and validation CSV data files\n",
-    "- Dataset class name\n",
-    "\n",
-    "To get the full list of arguments to tune a finetuning run use:\n",
-    "```bash\n",
-    "finetune_esm2 --help \n",
-    "```\n",
-    "For a detailed description of training loop and the arguments please refer to the [ESM-2 Pretraining](./pretrain.md) tutorial."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<div class=\"alert alert-block alert-info\"> <b>NOTE</b>\n",
-    "\n",
-    "Due to Megatron limitations, the log produced by the training run iterates on steps/iterations and not epochs. Therefore, `Training epoch` counter stays at value zero while `iteration` and `global_step` increase during the course of training (example in the following).\n",
-    "\n",
-    "<pre>\n",
-    "Training epoch 0, iteration <x/max_steps> | ... | global_step: <x> | reduced_train_loss: ... | val_loss: ...\n",
-    "</pre>\n",
-    "\n",
-    "to achieve the same epoch-based effect while training, please choose the number of training steps (`num_steps`) so that:\n",
-    "\n",
-    "<pre>\n",
-    "num_steps * global_batch_size = len(dataset) * desired_num_epochs\n",
-    "</pre></div>"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Regression"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "For the purposes of this demo, we'll assume dataset consists of small set of protein sequences with a target value of `len(sequence) / 100.0` as their labels:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import pandas as pd\n",
-    "\n",
-    "artificial_sequence_data = [\n",
-    "    \"TLILGWSDKLGSLLNQLAIANESLGGGTIAVMAERDKEDMELDIGKMEFDFKGTSVI\",\n",
-    "    \"LYSGDHSTQGARFLRDLAENTGRAEYELLSLF\",\n",
-    "    \"GRFNVWLGGNESKIRQVLKAVKEIGVSPTLFAVYEKN\",\n",
-    "    \"DELTALGGLLHDIGKPVQRAGLYSGDHSTQGARFLRDLAENTGRAEYELLSLF\",\n",
-    "    \"KLGSLLNQLAIANESLGGGTIAVMAERDKEDMELDIGKMEFDFKGTSVI\",\n",
-    "    \"LFGAIGNAISAIHGQSAVEELVDAFVGGARISSAFPYSGDTYYLPKP\",\n",
-    "    \"LGGLLHDIGKPVQRAGLYSGDHSTQGARFLRDLAENTGRAEYELLSLF\",\n",
-    "    \"LYSGDHSTQGARFLRDLAENTGRAEYELLSLF\",\n",
-    "    \"ISAIHGQSAVEELVDAFVGGARISSAFPYSGDTYYLPKP\",\n",
-    "    \"SGSKASSDSQDANQCCTSCEDNAPATSYCVECSEPLCETCVEAHQRVKYTKDHTVRSTGPAKT\",\n",
-    "]\n",
-    "\n",
-    "data = [(seq, len(seq)/100.0) for seq in artificial_sequence_data]\n",
-    "\n",
-    "# Create a DataFrame\n",
-    "df = pd.DataFrame(data, columns=[\"sequences\", \"labels\"])\n",
-    "\n",
-    "# Save the DataFrame to a CSV file\n",
-    "data_path = os.path.join(work_dir, \"regression_data.csv\")\n",
-    "df.to_csv(data_path, index=False)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%capture --no-display --no-stderr cell_output\n",
-    "\n",
-    "! finetune_esm2 \\\n",
-    "    --restore-from-checkpoint-path {checkpoint_path} \\\n",
-    "    --train-data-path {data_path} \\\n",
-    "    --valid-data-path {data_path} \\\n",
-    "    --config-class ESM2FineTuneSeqConfig \\\n",
-    "    --dataset-class InMemorySingleValueDataset \\\n",
-    "    --experiment-name \"regression\" \\\n",
-    "    --num-steps 50 \\\n",
-    "    --num-gpus 1 \\\n",
-    "    --val-check-interval 10 \\\n",
-    "    --log-every-n-steps 10 \\\n",
-    "    --lr 5e-3 \\\n",
-    "    --result-dir {work_dir}  \\\n",
-    "    --micro-batch-size 2 \\\n",
-    "    --num-gpus 1 \\\n",
-    "    --precision \"bf16-mixed\"\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The previous cell executes the finetuning and saves the checkpoints at the end of the run. The checkpoint path is logged at the end of the finetuning log file: \n",
-    "\n",
-    "```\n",
-    "[NeMo I $$$$-$$-$$ 22:04:28 nemo_logging:393] Async checkpoint save for step 50 (/workspace/bionemo2/esm2_finetune_tutorial/regression/dev/checkpoints/checkpoint-step=49-consumed_samples=100.0-last-v1.ckpt) finalized successfully.\n",
-    "```\n",
-    "\n",
-    "To avoid long text output from the previous cell, the log is captured and stored into the `cell_output` variable. To visualize the log file uncomment and execute the next cell:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# print(cell_output.stdout)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We can now use the checkpoint stored in the previous step to run inference. We will drop the `.ckpt` from the checkpoint path and provide that to the `--checkpoint-path` argument of `infer_esm2` executable.\n",
-    "\n",
-    "The input `--data-path` for inference is a CSV file with `sequences` column. It is also required to provide the appropriate `--config-class` name to load the model from the checkpoint. For a detailed description of inference arguments please refer to the [ESM-2 Inference](./inference.ipynb) tutorial."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Create a DataFrame\n",
-    "df = pd.DataFrame(artificial_sequence_data, columns=[\"sequences\"])\n",
-    "\n",
-    "# Save the DataFrame to a CSV file\n",
-    "data_path = os.path.join(work_dir, \"sequences.csv\")\n",
-    "df.to_csv(data_path, index=False)\n",
-    "\n",
-    "checkpoint_path = f\"{work_dir}/regression/dev/checkpoints/checkpoint-step=49-consumed_samples=100.0-last\"\n",
-    "results_path = f\"{work_dir}/regression/infer/\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%capture --no-display --no-stderr cell_output\n",
-    "\n",
-    "! infer_esm2 --checkpoint-path {checkpoint_path} \\\n",
-    "             --config-class ESM2FineTuneSeqConfig \\\n",
-    "             --data-path {data_path} \\\n",
-    "             --results-path {results_path} \\\n",
-    "             --micro-batch-size 3 \\\n",
-    "             --num-gpus 1 \\\n",
-    "             --precision \"bf16-mixed\" \\\n",
-    "             --include-embeddings \\\n",
-    "             --include-input-ids"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The inference results are written into a `.pt` file which can be loaded using PyTorch library:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "input_ids\ttorch.Size([10, 1024])\n",
-      "embeddings\ttorch.Size([10, 320])\n",
-      "regression_output\ttorch.Size([10, 1])\n"
-     ]
-    }
-   ],
-   "source": [
-    "import torch\n",
-    "results = torch.load(f\"{results_path}/predictions__rank_0.pt\")\n",
-    "\n",
-    "for key, val in results.items():\n",
-    "    if val is not None:\n",
-    "        print(f'{key}\\t{val.shape}')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Toke-level Classification data"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "For this task we assign secondary structure label to each token in the sequence:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "secondary_structure_labels = [\n",
-    "    \"EEEECCCCCHHHHHHHHHHHHHHHCCCEEEEEECCCHHHHHHHHHCCCCCCCCCEEE\",\n",
-    "    \"CCCCCHHHHHHHHHHHHHHCCCCCHHHHHHCC\",\n",
-    "    \"HHHHHCCCCCHHHHHHHHHHHHHHCCCHHHHHHHHHH\",\n",
-    "    \"HHHHHHHHHHCCCHHHHHCCCCCCCCHHHHHHHHHHHHHHCCCCCHHHHHHCC\",\n",
-    "    \"CHHHHHHHHHHHHHHHCCCEEEEEECCCHHHHHHHHHCCCCCCCCCEEE\",\n",
-    "    \"HHHHHHHHHHHHHCHHHHHHHHHHHHCCCEECCCEEEECCEEEEECC\",\n",
-    "    \"HHHHHCCCHHHHHCCCCCCCCHHHHHHHHHHHHHHCCCCCHHHHHHCC\",\n",
-    "    \"CCCCCHHHHHHHHHHHHHHCCCCCHHHHHHCC\",\n",
-    "    \"HHHHHCHHHHHHHHHHHHCCCEECCCEEEECCEEEEECC\",\n",
-    "    \"CCCCCCCCCCCCCCCCCCCCCCCCCCEEECCCCEEECHHHHHHHHHCCCCCCCCEEECCCCCC\",\n",
-    "]\n",
-    "\n",
-    "data = [(seq, label) for (seq, label) in zip(artificial_sequence_data, secondary_structure_labels)]\n",
-    "\n",
-    "# Create a DataFrame\n",
-    "df = pd.DataFrame(data, columns=[\"sequences\", \"labels\"])\n",
-    "\n",
-    "# Save the DataFrame to a CSV file\n",
-    "data_path = os.path.join(work_dir, \"token_classification_data.csv\")\n",
-    "df.to_csv(data_path, index=False)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%capture --no-display --no-stderr cell_output\n",
-    "\n",
-    "! finetune_esm2 \\\n",
-    "    --restore-from-checkpoint-path {checkpoint_path} \\\n",
-    "    --train-data-path {data_path} \\\n",
-    "    --valid-data-path {data_path} \\\n",
-    "    --config-class ESM2FineTuneTokenConfig \\\n",
-    "    --dataset-class InMemoryPerTokenValueDataset \\\n",
-    "    --experiment-name \"token_level_classification\" \\\n",
-    "    --num-steps 50 \\\n",
-    "    --num-gpus 1 \\\n",
-    "    --val-check-interval 10 \\\n",
-    "    --log-every-n-steps 10 \\\n",
-    "    --lr 5e-3 \\\n",
-    "    --result-dir {work_dir}  \\\n",
-    "    --micro-batch-size 2 \\\n",
-    "    --num-gpus 1 \\\n",
-    "    --precision \"bf16-mixed\"\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The previous cell executes the finetuning and saves the checkpoints at the end of the run. The checkpoint path is logged at the end of the finetuning log file: \n",
-    "\n",
-    "```\n",
-    "[NeMo I $$$$-$$-$$ 22:16:46 nemo_logging:393] Async checkpoint save for step 50 (/workspace/bionemo2/esm2_finetune_tutorial/token_level_classification/dev/checkpoints/checkpoint-step=49-consumed_samples=100.0-last.ckpt) finalized successfully.\n",
-    "```\n",
-    "\n",
-    "To avoid long text output from the previous cell, the log is captured and stored into the `cell_output` variable. To visualize the log file uncomment and execute the next cell:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# print(cell_output.stdout)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We can now use the checkpoint stored in the previous step to run inference. We will drop the `.ckpt` from the checkpoint path and provide that to the `--checkpoint-path` argument of `infer_esm2` executable.\n",
-    "\n",
-    "The input `--data-path` for inference is a CSV file with `sequences` column. It is also required to provide the appropriate `--config-class` name to load the model from the checkpoint. For a detailed description of inference arguments please refer to the [ESM-2 Inference](./inference.ipynb) tutorial."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Create a DataFrame\n",
-    "df = pd.DataFrame(artificial_sequence_data, columns=[\"sequences\"])\n",
-    "\n",
-    "# Save the DataFrame to a CSV file\n",
-    "data_path = os.path.join(work_dir, \"sequences.csv\")\n",
-    "df.to_csv(data_path, index=False)\n",
-    "\n",
-    "checkpoint_path = f\"{work_dir}/token_level_classification/dev/checkpoints/checkpoint-step=49-consumed_samples=100.0-last\"\n",
-    "results_path = f\"{work_dir}/token_level_classification/infer/\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 15,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%capture --no-display --no-stderr cell_output\n",
-    "\n",
-    "! infer_esm2 --checkpoint-path {checkpoint_path} \\\n",
-    "             --config-class ESM2FineTuneTokenConfig \\\n",
-    "             --data-path {data_path} \\\n",
-    "             --results-path {results_path} \\\n",
-    "             --micro-batch-size 3 \\\n",
-    "             --num-gpus 1 \\\n",
-    "             --precision \"bf16-mixed\" \\\n",
-    "             --include-embeddings \\\n",
-    "             --include-hiddens \\\n",
-    "             --include-input-ids"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The inference results are written into a `.pt` file which can be loaded using PyTorch library:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 16,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "hidden_states\ttorch.Size([10, 1024, 320])\n",
-      "input_ids\ttorch.Size([10, 1024])\n",
-      "embeddings\ttorch.Size([10, 320])\n",
-      "classification_output\ttorch.Size([10, 1024, 3])\n"
-     ]
-    }
-   ],
-   "source": [
-    "import torch\n",
-    "results = torch.load(f\"{results_path}/predictions__rank_0.pt\")\n",
-    "\n",
-    "for key, val in results.items():\n",
-    "    if val is not None:\n",
-    "        print(f'{key}\\t{val.shape}')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We can use the label tokenizer to convert the classification output to class names. Note that for demonstration purposes we are using a small dataset of artificial sequences in this example. You may experience over-fitting and observe no change in the validation metrics. This amount of data and the short training run does not result in accurate predictions."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 17,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from bionemo.esm2.data.tokenizer import get_tokenizer\n",
-    "\n",
-    "\n",
-    "tokenizer = get_tokenizer()\n",
-    "tokens = tokenizer.all_tokens\n",
-    "aa_tokens = ['L', 'A', 'G', 'V', 'S', 'E', 'R', 'T', 'I', 'D', 'P', 'K', 'Q', 'N', 'F', 'Y', 'M', 'H', 'W', 'C']\n",
-    "aa_indices = [i for i, token in enumerate(tokens) if token in aa_tokens]\n",
-    "extra_indices = [i for i, token in enumerate(tokens) if token not in aa_tokens]\n",
-    "\n",
-    "input_ids = results['input_ids'] # b, s\n",
-    "# mask where non-amino acid tokens are True\n",
-    "mask = ~torch.isin(input_ids, torch.tensor(extra_indices))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 18,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from bionemo.llm.data.label2id_tokenizer import Label2IDTokenizer\n",
-    "\n",
-    "label_tokenizer = Label2IDTokenizer()\n",
-    "label_tokenizer = label_tokenizer.build_vocab(secondary_structure_labels)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 19,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Predicted Secondary Structures:\n",
-      "HHHHEEEEECCCCCCCCCCCCCCCEEEHHHHHHEEECCCCCCCCCEEEEEEEEEHHH\n",
-      "EEEEECCCCCCCCCCCCCCEEEEECCCCCCEE\n",
-      "CCCCCEEEEECCCCCCCCCCCCCEEEECCCCCCCCCC\n",
-      "CCCCCCCCCCEEECCCCCEEEEEEEECCCCCCCCCCCCCCEEEEECCCCCCEE\n",
-      "ECCCCCCCCCCCCCCCEEEHHHHHHEEECCCCCCCCCEEEEEEEEEHHH\n",
-      "CCCCCCCCCCCCCECCCCCCCCCCCCEEEHHEEEHHHHEEHHHHHEE\n",
-      "CCCCCEEECCCCCEEEEEEEECCCCCCCCCCCCCCEEEEECCCCCCEE\n",
-      "EEEEECCCCCCCCCCCCCCEEEEECCCCCCEE\n",
-      "CCCCCECCCCCCCCCCCCEEEHHEEEHHHHEEHHHHHEE\n",
-      "EEEEEEEEEEEEEEEEEEEEEEEEEEHHHEEEEHHHECCCCCCCCCEEEEEEEEHHHEEEEEE\n"
-     ]
-    }
-   ],
-   "source": [
-    "output_ids = torch.argmax(results[\"classification_output\"], dim=-1)\n",
-    "\n",
-    "print(\"Predicted Secondary Structures:\")\n",
-    "for i in range(output_ids.shape[0]):\n",
-    "    ss_ids = output_ids[i][mask[i]]\n",
-    "    print(label_tokenizer.ids_to_text(ss_ids.tolist()))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.12.3"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}