Skip to content

Memory leaks when using tf.keras.metrics update_states in the master grpc server. #1568

Open
@workingloong

Description

@workingloong

In ElasticDL, the master creates evaluation tasks and dispatch those tasks to workers. Each worker reports the model output and labels to the master after doing an evaluation task. Then the master update tf.keras.metrics by those outputs and labels received by GRPC. However, GPRC is a multithreaded application. So it can parallelly receive the outputs from multiple workers and update tf.keras.metric in each launched thread.

The memory leak is very obvious when we set futures.ThreadPoolExecutor(max_workers=64) and disappears if futures.ThreadPoolExecutor(max_workers=1). We wonder that memory leak occurs when executing tf.kreas.metric.update_states using multi-threading. So we make an unit test to reproduce the memory leak using multi-threading and submit the issue to Tensorflow issue 35044.

for metric_inst in metrics.values():
metric_inst.update_state(labels, outputs)

Metadata

Metadata

Assignees

No one assigned

    Labels

    TF_BUGbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions