Skip to content

ImageNet running in Yarn, nodeManager memory keep on increasing #123

@nhe150

Description

@nhe150

I have run the imagenet in yarn cluster mode. Noticed nodemanager memory keep on increasing. Seems to be some memory leak in c++/jni code since coarsedGrainedbackend memory is very stable.

See the two process: (1127 keep on growing, while 1130 very stable)

****0 S yarn 1127 1125 0 80 0 - 2910 wait 13:15 ? 00:00:00 /bin/bash -c LD_LIBRARY_PATH=/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/lib/hadoop/../../../CDH-5.7.0-1.cdh5.7.0.p0.45/lib/hadoop/lib/native:/opt/gpu/cuda/lib64:/data02/nhe/SparkNet/lib:/data02/nhe/cuda-7.0::/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/lib/hadoop/lib/native /usr/lib/jvm/java-7-oracle-cloudera/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms22528m -Xmx22528m -Djava.io.tmpdir=/data02/yarn/nm/usercache/hdfs/appcache/application_1461609406099_0001/container_1461609406099_0001_02_000002/tmp '-Dspark.authenticate=false' '-Dspark.driver.port=56487' '-Dspark.shuffle.service.port=7337' '-Dspark.ui.port=0' -Dspark.yarn.app.container.log.dir=/data02/yarn/container-logs/application_1461609406099_0001/container_1461609406099_0001_02_000002 -XX:MaxPermSize=256m org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://[email protected]:56487 --executor-id 1 --hostname bdalab12.samsungsdsra.com --cores 16 --app-id application_1461609406099_0001 --user-class-path file:/data02/yarn/nm/usercache/hdfs/appcache/application_1461609406099_0001/container_1461609406099_0001_02_000002/app.jar 1> /data02/yarn/container-logs/application_1461609406099_0001/container_1461609406099_0001_02_000002/stdout 2> /data02/yarn/container-logs/application_1461609406099_0001/container_1461609406099_0001_02_000002/stderr


0 S yarn 1130 1127 99 80 0 - 56878287 futex_ 13:15 ? 01:25:40 /usr/lib/jvm/java-7-oracle-cloudera/bin/java -server -XX:OnOutOfMemoryError=kill %p -Xms22528m -Xmx22528m -Djava.io.tmpdir=/data02/yarn/nm/usercache/hdfs/appcache/application_1461609406099_0001/container_1461609406099_0001_02_000002/tmp -Dspark.authenticate=false -Dspark.driver.port=56487 -Dspark.shuffle.service.port=7337 -Dspark.ui.port=0 -Dspark.yarn.app.container.log.dir=/data02/yarn/container-logs/application_1461609406099_0001/container_1461609406099_0001_02_000002 -XX:MaxPermSize=256m org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://[email protected]:56487 --executor-id 1 --hostname bdalab12.samsungsdsra.com --cores 16 --app-id application_1461609406099_0001 --user-class-path file:/data02/yarn/nm/usercache/hdfs/appcache/application_1461609406099_0001/container_1461609406099_0001_02_000002/app.jar

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions