Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amgen's AMPLIFY Port #442

Draft
wants to merge 116 commits into
base: main
Choose a base branch
from
Draft
Changes from 1 commit
Commits
Show all changes
116 commits
Select commit Hold shift + click to select a range
9755977
adding bionemo-scclip to sub-packages
ynashed Sep 11, 2024
843ce40
Initial model change
ynashed Sep 30, 2024
9fe70d6
Add bionemo-amplify sub-package and its requirements
ynashed Oct 3, 2024
e21c692
Added AMPLIFY tokenizer
ynashed Oct 3, 2024
2e9e299
Added AMPLIFY dataset and dataloader
ynashed Oct 4, 2024
8b41f03
Finalizing amplify model and train script
ynashed Oct 4, 2024
82928a4
Add bionemo-esm2 to local requirements
ynashed Oct 4, 2024
2d0181e
Update biobert_spec_option to use esm2_bert_layer_local_spec
ynashed Oct 4, 2024
87b259f
Fix syntax error
ynashed Oct 4, 2024
fa8daad
Update lr_scheduler import in amplify_pretrain.py
ynashed Oct 4, 2024
794a40b
Update amplify_pretrain.py to use hf_dataset_name instead of individu…
ynashed Oct 4, 2024
c1a03d2
Fixing errors related to hf_dataset_name
ynashed Oct 5, 2024
94b9649
Update optimizer and lr_scheduler in amplify_pretrain.py
ynashed Oct 5, 2024
d8e4d1a
Bugfixes
ynashed Oct 5, 2024
552a952
Merge pull request #1 from NVIDIA/main
ynashed Oct 8, 2024
0cbc475
adding bionemo-scclip to sub-packages
ynashed Sep 11, 2024
a1e323f
Initial model change
ynashed Sep 30, 2024
bd7041c
Add bionemo-amplify sub-package and its requirements
ynashed Oct 3, 2024
897601e
Added AMPLIFY tokenizer
ynashed Oct 3, 2024
1dc780c
Added AMPLIFY dataset and dataloader
ynashed Oct 4, 2024
88ebe21
Finalizing amplify model and train script
ynashed Oct 4, 2024
ee6e91b
Add bionemo-esm2 to local requirements
ynashed Oct 4, 2024
a7f8ff4
Update biobert_spec_option to use esm2_bert_layer_local_spec
ynashed Oct 4, 2024
1a1c4e8
Fix syntax error
ynashed Oct 4, 2024
3c9ffc7
Update lr_scheduler import in amplify_pretrain.py
ynashed Oct 4, 2024
52809c7
Update amplify_pretrain.py to use hf_dataset_name instead of individu…
ynashed Oct 4, 2024
9a4aec8
Fixing errors related to hf_dataset_name
ynashed Oct 5, 2024
d6eada9
Update optimizer and lr_scheduler in amplify_pretrain.py
ynashed Oct 5, 2024
0901c5d
Bugfixes
ynashed Oct 5, 2024
d261500
Update tach.toml to include bionemo.amplify module and its dependencies
ynashed Oct 8, 2024
579f16a
Updated bionemo-amplify to sync with upstream esm2 changes
ynashed Oct 10, 2024
0c9de08
Ignore test_experiment directory in git
ynashed Oct 10, 2024
abfa9da
solving merge conflicts
ynashed Oct 10, 2024
3820690
Update BioNeMoAMPLIFYTokenizer to use EsmTokenizer
ynashed Oct 10, 2024
059cc8c
Update BioNeMoAMPLIFYTokenizer to use chandar-lab/AMPLIFY_350M
ynashed Oct 10, 2024
b8fd917
Update BioNeMoAMPLIFYTokenizer to fix serialization issue
ynashed Oct 10, 2024
80947b0
Fix range for random_tokens in AMPLIFYMaskedResidueDataset
ynashed Oct 10, 2024
f8eac7c
Refactor index variable in AMPLIFYMaskedResidueDataset's __getitem__ …
ynashed Oct 10, 2024
121ff5f
Adding AMPLIFY specific config parameters
ynashed Oct 14, 2024
c758656
Amplify doesn't inherit from ESM2Model anymore
ynashed Oct 17, 2024
1e36af1
removed extra layernorm, added gradient clipping, configure cosine lr…
ynashed Oct 19, 2024
273cf6a
reducing attention block ffn_hidden_size to match the paper
ynashed Oct 22, 2024
26b12a0
Dataset resampling with MultiEpochDatasetResampler
ynashed Oct 22, 2024
21083c0
Merge branch 'NVIDIA:main' into v2-main
ynashed Oct 22, 2024
e47def5
cast np.int64 to int in dataset __getitem__
ynashed Oct 22, 2024
522e028
Merge branch 'v2-main' into ynashed/v2-main/amplify
ynashed Oct 22, 2024
eb2d658
Update amplify to match latest esm2 code
ynashed Oct 22, 2024
b822461
Revert to PRNGResampleDataset
ynashed Oct 22, 2024
7f40015
optimize multi_epoch_dataset for constant memory and space usage
pstjohn Oct 23, 2024
e09db4e
Merge pull request #2 from NVIDIA/pstjohn/main/optimize-multi-epoch-d…
ynashed Oct 23, 2024
70284e0
Trying out upstream MultiEpochDatasetResampler optimization
ynashed Oct 23, 2024
89166e4
Added dataset_subset to AMPLIFYMaskedResidueDataset
ynashed Oct 23, 2024
f699da2
Changed defaults to 120M Model. Added final_step to lr scheduler (cos…
ynashed Nov 2, 2024
97b4bf6
enabled bf16 in the optimizer
ynashed Nov 4, 2024
77040d5
LR scheduler warmup starts from min_lr 0 by default
ynashed Nov 4, 2024
af44848
Fixing cosine lr scheduler to match HF implementation
ynashed Nov 4, 2024
c52d5ba
Turning off CosineAnnealingScheduler constan_steps
ynashed Nov 5, 2024
2305b17
RandomMaskStrategy defaults to AMINO_ACIDS_ONLY
ynashed Nov 6, 2024
429bbdc
Make sure <mask> token exists in masked sequence
ynashed Nov 6, 2024
c5ef9d4
Forgot self. (doh)
ynashed Nov 6, 2024
c65f86e
revert the masking check
ynashed Nov 7, 2024
2f1b149
Turning off weight decay
ynashed Nov 7, 2024
16d0a30
[WIP] fix tests
ynashed Nov 8, 2024
3260d37
Merge remote-tracking branch 'upstream/main'
ynashed Nov 8, 2024
ece8de4
Merge branch 'main' into ynashed/v2-main/amplify
ynashed Nov 8, 2024
d64740e
esm2 updates
ynashed Nov 8, 2024
82de2ab
Fixes after testing
ynashed Nov 8, 2024
cdba113
adding log-every-n-steps argument
ynashed Nov 8, 2024
6c52266
Merge remote-tracking branch 'upstream/main'
ynashed Nov 12, 2024
9b7d5ff
Merge branch 'main' into ynashed/v2-main/amplify
ynashed Nov 12, 2024
efc53eb
added nsys profiling arguments
ynashed Nov 13, 2024
e803606
removed slowdown in dataset getitem
ynashed Nov 13, 2024
3f649ab
Trying differemt optimizer and model configs
ynashed Nov 14, 2024
40566ec
roll back config changes
ynashed Nov 14, 2024
2a2cbac
Removed abandoned bionemo-scclip
ynashed Nov 15, 2024
2f051f8
Patching Megatron-LM to include pytorch optimizers as default
ynashed Dec 10, 2024
d10ab06
Adding run:ai submit scriots [WIP]
ynashed Dec 10, 2024
3e7f959
Added Mehatron optimizer patch file
ynashed Dec 10, 2024
50025eb
changes to match https://github.com/NVIDIA/NeMo/pull/11252
ynashed Dec 12, 2024
4280418
Adding WANDB_API_KEY run:ai secret
ynashed Dec 12, 2024
5d03f8d
Switch back to pytorch_lightning.callbacks import
ynashed Dec 12, 2024
3339571
trying to get rid if loss spikes
ynashed Dec 13, 2024
faed442
pip installing megatron-lm just for good measure
ynashed Dec 13, 2024
85e1d78
faster weight decay
ynashed Dec 13, 2024
f98f02d
Changing core attention to default
ynashed Dec 13, 2024
c161652
Trying training in fp32
ynashed Dec 13, 2024
192e121
OOM, trying fp32-mixed
ynashed Dec 13, 2024
5d37fea
OOM, fp16-mixed
ynashed Dec 13, 2024
32aa173
lower initial loss_scaling
ynashed Dec 14, 2024
5927369
fp16-mixed precision with constant loss scaling
ynashed Dec 14, 2024
ce2298c
loss_scale passed to the right class
ynashed Dec 14, 2024
77d3239
MegatronMixedPrecision class argument name fix
ynashed Dec 14, 2024
432acf0
reverting back to bf16-mixed
ynashed Dec 14, 2024
572511b
Increasing adam_eps to try counter grad norm explosion
ynashed Dec 14, 2024
7dcd4e1
reverting adam_eps to 1e-8
ynashed Dec 14, 2024
d7e1cc0
Turning off fusions, turning on attention_softmax_in_fp32
ynashed Dec 14, 2024
a02b4ee
Use esm2 LM_Head and layernorm instead of RMSNorm
ynashed Dec 16, 2024
98a62d7
Trying esm2_bert_layer_with_transformer_engine_spec
ynashed Dec 16, 2024
a027e2b
Allowing AmplifyConfig to accept esm2 bert spec
ynashed Dec 16, 2024
e7db7d3
Removing LM_head, using esm2_bert_layer_with_transformer_engine_spec,…
ynashed Dec 16, 2024
0f7ef3f
kubectl script changes to add random-mask-strategy argument
ynashed Dec 23, 2024
d60e76d
Merge remote-tracking branch 'upstream/main' into ynashed/v2-main/amp…
ynashed Dec 23, 2024
62caa27
Merge branch 'ynashed/amplify/runai' into ynashed/v2-main/amplify
ynashed Dec 24, 2024
b1836eb
Updates after sync with upstream
ynashed Dec 25, 2024
169e5bb
updating the pytorch_lightning import
ynashed Dec 30, 2024
ec10b36
Merge remote-tracking branch 'upstream/main'
ynashed Dec 30, 2024
b7e35ac
Merge branch 'main' into ynashed/v2-main/amplify
ynashed Dec 30, 2024
d0b837d
Merge remote-tracking branch 'upstream/main' into ynashed/v2-main/amp…
ynashed Dec 30, 2024
f6e0f37
bump NeMo version to match upstream
ynashed Dec 30, 2024
8f15b80
Add handling for special token IDs in BioNeMoAMPLIFYTokenizer
ynashed Dec 31, 2024
0107bc6
Add DDP configuration options for gradient reduction and parameter ga…
ynashed Dec 31, 2024
1a3ffa9
Adding train script entrypoint
ynashed Dec 31, 2024
7e8cddf
Conforming with upstream changes
ynashed Dec 31, 2024
5942afa
Removing runai scripts from git
ynashed Dec 31, 2024
cbf3535
AMPLIFYConfig::__post_init__ was called twice leading to decreased mo…
ynashed Jan 1, 2025
7fe9c46
Disable distributed optimizer in training script
ynashed Jan 1, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Conforming with upstream changes
  • Loading branch information
ynashed committed Dec 31, 2024
commit 7e8cddf2c91bf533b97959b66bb712430dbbd3c5
1 change: 0 additions & 1 deletion sub-packages/bionemo-core/pyproject.toml
Original file line number Diff line number Diff line change
@@ -14,7 +14,6 @@ dependencies = [
# bionemo sub-packages
# bionemo-core **MUST NOT** depend on any other sub-packages !!!!!
# external
"numba",
"numpy",
"platformdirs",
"torch>=2.2.1",
Original file line number Diff line number Diff line change
@@ -25,7 +25,6 @@
from megatron.core.transformer.attention import SelfAttention, SelfAttentionSubmodules
from megatron.core.transformer.custom_layers.transformer_engine import (
TEDotProductAttention,
TEColumnParallelLinear,
TELayerNormColumnParallelLinear,
TERowParallelLinear,
)
21 changes: 12 additions & 9 deletions tach.toml
Original file line number Diff line number Diff line change
@@ -8,15 +8,18 @@ exclude = [
"build",
]
source_roots = [
"sub-packages/bionemo-amplify/src",
"sub-packages/bionemo-core/src",
"sub-packages/bionemo-esm2/src",
"sub-packages/bionemo-example_model/src",
"sub-packages/bionemo-fw/src",
"sub-packages/bionemo-geneformer/src",
"sub-packages/bionemo-llm/src",
"sub-packages/bionemo-testing/src",
"sub-packages/bionemo-webdatamodule/src",
'sub-packages/bionemo-amplify/src',
'sub-packages/bionemo-core/src',
'sub-packages/bionemo-esm2/src',
'sub-packages/bionemo-example_model/src',
'sub-packages/bionemo-fw/src',
'sub-packages/bionemo-geneformer/src',
'sub-packages/bionemo-geometric/src',
'sub-packages/bionemo-llm/src',
'sub-packages/bionemo-scdl/src',
'sub-packages/bionemo-size-aware-batching/src',
'sub-packages/bionemo-testing/src',
'sub-packages/bionemo-webdatamodule/src',
]

[[modules]]