Skip to content

Export the model to SavedModel which can't fit into a single worker pod. #1407

Open
@brightcoder01

Description

@brightcoder01

The model file size can reach >= 10G. In the current implementation, the entire model is construct in the worker memory and call the tf.saved_model.save to export the model. It can result in OOM. We need a solution to make cooperation between PS and worker to export the model.
PS save the variable shard assigned to this specific instance.
Worker save the model definition graph.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions