[DRAFT] Sagemaker integration #151

sedrick-keh-tri · 2023-12-12T01:32:15Z

This still needs some fixing because there are some TRI-specific stuff in launch_sagemaker_train.py

It works if you run python sagemaker_train/launch_sagemaker_train.py --user sedrick.keh --cfg-path sagemaker_train/cfg_sample.yaml --build. For easy testing, I edited params.py to accept openlm_mix_tri_s3 in my local code (not pushed) but aside from that it works without any other changes

sedrick-keh-tri · 2023-12-12T13:44:33Z

Edited the file to remove TRI mentions. Now the region, ARN, and s3 path can be supplied through command line or env variable.

Still not sure how to handle the train-data parameter. That might be a separate issue altogether, and we can just ignore it for this PR, but I rather like being able to easily supply something simple like "openlm_mix_tri_s3" to the train-data parameter

Other than that, everything else here should work without issue

achalddave

small nits, otherwise looks good to me!

sagemaker_train/cfg_sample.yaml

achalddave · 2023-12-12T20:38:43Z

sagemaker_train/launch_sagemaker_train.py

+    checkpoint_local_path = "/opt/ml/checkpoints"
+
+    with open(args.cfg_path, "r") as f:
+        train_args = yaml.safe_load(f)


Now that openlm supports a --config, let's just pass the config to openlm directly, with --config args.cfg_path?

Tried this but it seems that it gives some typing errors when passed through sagemaker:
Type mismatch (config: <class 'str'> vs. argparse: <class 'bool'>) with values (config: vs. argparse: False) for config. key: dataset_resampled

Leaving it as is for now

achalddave · 2023-12-13T07:40:23Z

sagemaker_train/cfg_sample.yaml

+beta1: 0.9
+beta2: 0.95
+data-key: "json"
+dataset-resampled: ""


If you set this to True instead of empty string, the error you mentioned should go away when passing the config via path (and similarly for all other keys which have a "" value - set them to True instead)

I tried that and it still gives the same error. It seems to read everything as a string.

Just to double-check: The way to pass a config is to just do train_args = {"config": args.cfg_path} instead of the yaml.safe_load(f), right?

Hmm yeah that should be all you need. And you rebuilt the docker container after that change right? Will try it out later today, maybe there's something wrong with the parsing logic.

achalddave · 2023-12-13T19:05:36Z

made some changes that should fix the issues with config here: https://github.com/sedrick-keh-tri/open_lm/pull/1. I think if you merge that PR into your branch, it should update this PR, and we should be good to merge.

Updates for sagemaker integration

sedrick-keh-tri added 4 commits December 12, 2023 01:22

remove relative imports

c364305

sm-integration

8da7d11

folder rename + args for region, arn, s3

acec8b3

setup.py fixes

a9f9d5c

achalddave reviewed Dec 12, 2023

View reviewed changes

cleanup tri mentions + address nits

8ce69cf

achalddave reviewed Dec 13, 2023

View reviewed changes

Updates for sagemaker integration

2162ea8

sedrick-keh-tri and others added 5 commits December 14, 2023 03:45

Merge pull request #1 from mlfoundations/sagemaker-integration

6da4439

Updates for sagemaker integration

sm fixes

f6e7778

format + conflicts

9e25570

Merge branch 'main' into sm-integration

541ddb7

Merge branch 'main' into sm-integration

4f2df2d

achalddave approved these changes Dec 16, 2023

View reviewed changes

Merge branch 'main' into sm-integration

6ac73d7

achalddave merged commit 0da1e0c into mlfoundations:main Dec 20, 2023
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT] Sagemaker integration #151

[DRAFT] Sagemaker integration #151

sedrick-keh-tri commented Dec 12, 2023 •

edited

Loading

sedrick-keh-tri commented Dec 12, 2023 •

edited

Loading

achalddave left a comment

achalddave Dec 12, 2023

sedrick-keh-tri Dec 13, 2023

achalddave Dec 13, 2023

sedrick-keh-tri Dec 13, 2023

achalddave Dec 13, 2023

achalddave commented Dec 13, 2023

[DRAFT] Sagemaker integration #151

[DRAFT] Sagemaker integration #151

Conversation

sedrick-keh-tri commented Dec 12, 2023 • edited Loading

sedrick-keh-tri commented Dec 12, 2023 • edited Loading

achalddave left a comment

Choose a reason for hiding this comment

achalddave Dec 12, 2023

Choose a reason for hiding this comment

sedrick-keh-tri Dec 13, 2023

Choose a reason for hiding this comment

achalddave Dec 13, 2023

Choose a reason for hiding this comment

sedrick-keh-tri Dec 13, 2023

Choose a reason for hiding this comment

achalddave Dec 13, 2023

Choose a reason for hiding this comment

achalddave commented Dec 13, 2023

sedrick-keh-tri commented Dec 12, 2023 •

edited

Loading

sedrick-keh-tri commented Dec 12, 2023 •

edited

Loading