Open
Description
The model file size can reach >= 10G. In the current implementation, the entire model is construct in the worker memory and call the tf.saved_model.save to export the model. It can result in OOM. We need a solution to make cooperation between PS and worker to export the model.
PS save the variable shard assigned to this specific instance.
Worker save the model definition graph.