-
Notifications
You must be signed in to change notification settings - Fork 170
Description
I have run the imagenet in yarn cluster mode. Noticed nodemanager memory keep on increasing. Seems to be some memory leak in c++/jni code since coarsedGrainedbackend memory is very stable.
See the two process: (1127 keep on growing, while 1130 very stable)
****0 S yarn 1127 1125 0 80 0 - 2910 wait 13:15 ? 00:00:00 /bin/bash -c LD_LIBRARY_PATH=/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/lib/hadoop/../../../CDH-5.7.0-1.cdh5.7.0.p0.45/lib/hadoop/lib/native:/opt/gpu/cuda/lib64:/data02/nhe/SparkNet/lib:/data02/nhe/cuda-7.0::/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/lib/hadoop/lib/native /usr/lib/jvm/java-7-oracle-cloudera/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms22528m -Xmx22528m -Djava.io.tmpdir=/data02/yarn/nm/usercache/hdfs/appcache/application_1461609406099_0001/container_1461609406099_0001_02_000002/tmp '-Dspark.authenticate=false' '-Dspark.driver.port=56487' '-Dspark.shuffle.service.port=7337' '-Dspark.ui.port=0' -Dspark.yarn.app.container.log.dir=/data02/yarn/container-logs/application_1461609406099_0001/container_1461609406099_0001_02_000002 -XX:MaxPermSize=256m org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://[email protected]:56487 --executor-id 1 --hostname bdalab12.samsungsdsra.com --cores 16 --app-id application_1461609406099_0001 --user-class-path file:/data02/yarn/nm/usercache/hdfs/appcache/application_1461609406099_0001/container_1461609406099_0001_02_000002/app.jar 1> /data02/yarn/container-logs/application_1461609406099_0001/container_1461609406099_0001_02_000002/stdout 2> /data02/yarn/container-logs/application_1461609406099_0001/container_1461609406099_0001_02_000002/stderr
0 S yarn 1130 1127 99 80 0 - 56878287 futex_ 13:15 ? 01:25:40 /usr/lib/jvm/java-7-oracle-cloudera/bin/java -server -XX:OnOutOfMemoryError=kill %p -Xms22528m -Xmx22528m -Djava.io.tmpdir=/data02/yarn/nm/usercache/hdfs/appcache/application_1461609406099_0001/container_1461609406099_0001_02_000002/tmp -Dspark.authenticate=false -Dspark.driver.port=56487 -Dspark.shuffle.service.port=7337 -Dspark.ui.port=0 -Dspark.yarn.app.container.log.dir=/data02/yarn/container-logs/application_1461609406099_0001/container_1461609406099_0001_02_000002 -XX:MaxPermSize=256m org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://[email protected]:56487 --executor-id 1 --hostname bdalab12.samsungsdsra.com --cores 16 --app-id application_1461609406099_0001 --user-class-path file:/data02/yarn/nm/usercache/hdfs/appcache/application_1461609406099_0001/container_1461609406099_0001_02_000002/app.jar