Should I configure FP16, optimizers, batch_size in DeepSpeed config of Pytorch-Lightning? #12465

ShaneTian · 2022-03-26T09:57:07Z

ShaneTian
Mar 26, 2022

My deepspeed_zero2_config.json:

{
    "zero_optimization": {
        "stage": 2,
        "offload_optimizer": {
            "device": "cpu",
            "pin_memory": true
        },
        "allgather_partitions": true,
        "allgather_bucket_size": 2e8,
        "overlap_comm": true,
        "reduce_scatter": true,
        "reduce_bucket_size": 2e8,
        "contiguous_gradients": true
    },
    "gradient_accumulation_steps": "auto",
    "gradient_clipping": "auto",
    "steps_per_print": 2000,
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": "auto",
    "wall_clock_breakdown": false
}

I have some questions about how to configure DeepSpeed in Pytorch-Lightning:

I see that the custom deepspeed config includes optimizer and scheduler. Should I add them in my config even I have configured in Model.configure_optimizers?
```
You have not specified an optimizer or scheduler within the DeepSpeed config. Using `configure_optimizers` to define optimizer and scheduler.
```
Should I add fp16 config into deepspeed config json even I have passed precision="bf16" in pl.Trainer?

Should I pass logging_batch_size_per_gpu to pl.plugins.DeepSpeedPlugin even I have configured batch_size in data loader?

[2022-03-24 12:42:11,529] [WARNING] [deepspeed.py:630:_auto_select_batch_size] Tried to infer the batch size for internal deepspeed logging from the `train_dataloader()`. To ensure DeepSpeed logging remains correct, please manually pass the plugin with the batch size, `Trainer(strategy=DeepSpeedPlugin(logging_batch_size_per_gpu=batch_size))`.

It appears in log before training every time. Is that okey?

Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.

Thanks a lot!😊

Answered by rohitgr7

Mar 26, 2022

yes, you don't need to set them inside config since this is done by Lightning already here if you set them in trainer and lightning module: https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/strategies/deepspeed.py

View full answer

rohitgr7 · 2022-03-26T14:14:28Z

rohitgr7
Mar 26, 2022

yes, you don't need to set them inside config since this is done by Lightning already here if you set them in trainer and lightning module: https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/strategies/deepspeed.py

2 replies

ShaneTian Mar 26, 2022
Author

Thank you~
For 3rd question, why do I need to set logging_batch_size_per_gpu? What are the consequences if I do not set this parameter?

rohitgr7 Mar 26, 2022

as per the docs and code, it will infer it from the dataloader if not set.
https://github.com/PyTorchLightning/pytorch-lightning/blob/8b4abe4edb6912abeb3906c48bb822a3681b08c4/pytorch_lightning/strategies/deepspeed.py#L228-L233
https://github.com/PyTorchLightning/pytorch-lightning/blob/8b4abe4edb6912abeb3906c48bb822a3681b08c4/pytorch_lightning/strategies/deepspeed.py#L631-L650

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Should I configure FP16, optimizers, batch_size in DeepSpeed config of Pytorch-Lightning? #12465

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Should I configure FP16, optimizers, batch_size in DeepSpeed config of Pytorch-Lightning? #12465

Uh oh!

ShaneTian Mar 26, 2022

Replies: 1 comment · 2 replies

Uh oh!

Uh oh!

rohitgr7 Mar 26, 2022

Uh oh!

ShaneTian Mar 26, 2022 Author

Uh oh!

rohitgr7 Mar 26, 2022

ShaneTian
Mar 26, 2022

Replies: 1 comment 2 replies

rohitgr7
Mar 26, 2022

ShaneTian Mar 26, 2022
Author