Skip to content

How save deepspeed stage 3 model with pickle or torch #8910

Discussion options

You must be logged in to vote

After some debugging with a user, I've come up with a final script to show how you can use the convert_zero_checkpoint_to_fp32_state_dict to generate a single file that can be loaded using pickle, or lightning.

import os

import torch
from torch.utils.data import DataLoader, Dataset

from pytorch_lightning import LightningModule, Trainer
from pytorch_lightning.callbacks import ModelCheckpoint
from pytorch_lightning.plugins import DeepSpeedPlugin
from pytorch_lightning.utilities.deepspeed import convert_zero_checkpoint_to_fp32_state_dict


class RandomDataset(Dataset):
    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __g…

Replies: 3 comments 3 replies

Comment options

You must be logged in to vote
1 reply
@ViktorThink
Comment options

Comment options

You must be logged in to vote
1 reply
@Zhylkaaa
Comment options

Comment options

You must be logged in to vote
1 reply
@ViktorThink
Comment options

Answer selected by ViktorThink
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants