Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark applications using the Loki appender hang at shutdown #99

Closed
eugenwintersberger opened this issue Dec 7, 2022 · 8 comments
Closed

Comments

@eugenwintersberger
Copy link

eugenwintersberger commented Dec 7, 2022

I am using the Loki log appender in an Apache Spark application deployed with spark-submit. The Spark application hangs indefinitely at shutdown. When running the application from the command line I have to explicitly kill the task using CTRL-C in order to terminate the program.

This is the Scala code of my application:

object TestApplication {

  def main(args: Array[String]) = {
    val session = SparkSession.builder.getOrCreate()
    import session.implicits._
    val logger = LogManager.getLogger("TestApplicationLogger")
    logger.info("starting application ")
    val data = Seq.fill[Int](100)(1)
    val count = data
      .toDS()
      .map(x => {
        val message = String.format(
          "process on node %s %s",
          java.net.InetAddress.getLocalHost().getHostName(),
          TaskContext.getPartitionId().toString())
        val l = LogManager.getLogger("TestApplicationLogger")
        l.info(message)
        x
      })
      .count
    logger.info(String.format("Processed %s elements", count.toString()))
  }
}

And here are the last lines of the command line output:

22/12/07 15:57:52 INFO TaskSchedulerImpl: Killing all running tasks in stage 2: Stage finished
22/12/07 15:57:52 INFO DAGScheduler: Job 1 finished: count at TestApplication.scala:19, took 0.368057 s
2022-12-07 15:57:52,364 main TRACE Log4jLoggerFactory.getContext() found anchor interface org.apache.spark.internal.Logging
22/12/07 15:57:52 INFO TestApplicationLogger: Processed 100 elements

- HERE THE PROGRAM STOPS AND CONTINUES ONLY AFTER MANUALLY TERMINATING IT WITH CTRL-C -

22/12/07 15:59:09 INFO SparkContext: Invoking stop() from shutdown hook
2022-12-07 15:59:09,128 SparkUI-71 TRACE Log4jLoggerFactory.getContext() found anchor class org.sparkproject.jetty.util.log.Slf4jLog
2022-12-07 15:59:09,131 shutdown-hook-0 TRACE Log4jLoggerFactory.getContext() found anchor interface org.apache.spark.internal.Logging
22/12/07 15:59:09 INFO SparkUI: Stopped Spark web UI at http://192.168.88.194:4040
22/12/07 15:59:09 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/12/07 15:59:09 INFO MemoryStore: MemoryStore cleared
22/12/07 15:59:09 INFO BlockManager: BlockManager stopped
22/12/07 15:59:09 INFO BlockManagerMaster: BlockManagerMaster stopped
22/12/07 15:59:09 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/12/07 15:59:09 INFO SparkContext: Successfully stopped SparkContext
22/12/07 15:59:09 INFO ShutdownHookManager: Shutdown hook called
22/12/07 15:59:09 INFO ShutdownHookManager: Deleting directory /tmp/spark-ae372578-3098-4051-b54b-64444a536497
22/12/07 15:59:09 INFO ShutdownHookManager: Deleting directory /tmp/spark-37b54d5b-12bd-42e5-a828-daa55179b647

As you can see, the program stops after the last line in the Scala code and waits there indefinitely until I press CTRL-C.
Is this a known problem?

@tkowalcz
Copy link
Owner

Hi, sorry for late reply. This should not happen. Can you do thread dump (jstack) of the app so we could see which threads are keeping it from closing?

@theaob
Copy link

theaob commented Jan 20, 2023

Hello, I think the same issue is happening on Spring Boot as well. It seems that thread hangs at ReadingLogBufferAndSendingHttp in SleepingMillisIdleStrategy.

@eugenwintersberger
Copy link
Author

Sorry for my very, very, very late reply. I did not follow this issue since it seems that this is an issue specific to spark-submit. When I run the Spark job on a platform like Databricks everything works as expected. However, since this seems to show up at Spring Boot too, I guess some further investigation is a good idea. Do you still need a thread dump? In this case I somehow have to resurrect the old code in order to reproduce the error.

@tkowalcz
Copy link
Owner

Hi, yes please. If you can provide any more data I would greatly appreciate that. I was recently looking into this. @theaob mentioned ReadingLogBufferAndSendingHttp class which is from older version of Tjahzi. Now it's LogShipper so please also include the version of Tjahzi you are using. Thanks!

@lanyuanxiaoyao
Copy link

Hi, yes please. If you can provide any more data I would greatly appreciate that. I was recently looking into this. @theaob mentioned ReadingLogBufferAndSendingHttp class which is from older version of Tjahzi. Now it's LogShipper so please also include the version of Tjahzi you are using. Thanks!

I think i meet the same problem, it hang my spring boot application at shutdown.
image

The test project has only 3 files.

  • pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>com.lanyuanxiaoyao.micro.service</groupId>
        <artifactId>center</artifactId>
        <version>1.0.0</version>
    </parent>

    <artifactId>test</artifactId>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter</artifactId>
            <version>2.6.13</version>
        </dependency>
        <dependency>
            <groupId>pl.tkowalcz.tjahzi</groupId>
            <artifactId>logback-appender</artifactId>
            <version>0.9.26</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
                <version>2.6.13</version>
            </plugin>
        </plugins>
    </build>

</project>

TestApplication.java

package com.lanyuanxiaoyao.micro.service.test;

import org.springframework.boot.ApplicationArguments;
import org.springframework.boot.ApplicationRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

/**
 * @author lanyuanxiaoyao
 * @date 2023-02-02
 */
@SpringBootApplication
public class TestApplication implements ApplicationRunner {
    public static void main(String[] args) {
        SpringApplication.run(TestApplication.class, args);
    }

    @Override
    public void run(ApplicationArguments args) {
        System.out.println("hello world");
    }
}

logback-spring.xml

<configuration>
    <conversionRule conversionWord="clr" converterClass="org.springframework.boot.logging.logback.ColorConverter" />
    <conversionRule conversionWord="wex" converterClass="org.springframework.boot.logging.logback.WhitespaceThrowableProxyConverter" />
    <conversionRule conversionWord="wEx" converterClass="org.springframework.boot.logging.logback.ExtendedWhitespaceThrowableProxyConverter" />

    <appender name="Loki" class="pl.tkowalcz.tjahzi.logback.LokiAppender">
        <filter class="ch.qos.logback.classic.filter.ThresholdFilter">
            <level>WARN</level>
        </filter>
        <url>http://localhost/loki/api/v1/push</url>
        <encoder>
            <pattern>${FILE_LOG_PATTERN:-%d{${LOG_DATEFORMAT_PATTERN:-yyyy-MM-dd HH:mm:ss.SSS}} [${HOSTNAME}] ${LOG_LEVEL_PATTERN:-%5p} ${PID:- } --- [%t] %-40.40logger{39} : %m%n${LOG_EXCEPTION_CONVERSION_WORD:-%wEx}}</pattern>
        </encoder>
        <label>
            <name>app</name>
            <value>test</value>
        </label>
        <label>
            <name>host</name>
            <value>${HOSTNAME}</value>
        </label>
        <logLevelLabel>level</logLevelLabel>
    </appender>

    <appender name="Console" class="ch.qos.logback.core.ConsoleAppender">
        <encoder>
            <pattern>${CONSOLE_LOG_PATTERN:-%clr(%d{${LOG_DATEFORMAT_PATTERN:-yyyy-MM-dd HH:mm:ss.SSS}}){faint} %clr(${LOG_LEVEL_PATTERN:-%5p}) %clr(${PID:- }){magenta} %clr(---){faint} %clr([%15.15t]){faint} %clr(%-40.40logger{39}){cyan} %clr(:){faint} %m%n${LOG_EXCEPTION_CONVERSION_WORD:-%wEx}}</pattern>
        </encoder>
    </appender>

    <root level="INFO">
        <appender-ref ref="Console"/>
    </root>

</configuration>

If i remove the LokiAppender from logback-spring.xml, it will exit without hang.

@tkowalcz
Copy link
Owner

tkowalcz commented Feb 2, 2023

Thank you for providing reproducer.

@tkowalcz
Copy link
Owner

tkowalcz commented Feb 2, 2023

I have reproduced this issue. I am not very familiar with Spring app lifecycle but looks like logging system (LogbackLoggingSystem) provides a shutdown handler (getShutdownHandler) but these are only called by SpringApplicationShutdownHook which is called only when JVM exits. But JVM does not exit because Tjahzi has some non daemon threads. I am surprised that there is no explicit lifecycle phase here and Spring relies on JVM to invoke the cleanup.

I have intentionally used non-daemon threads so that exiting application must stop the logging system so all buffers are flushed and all messages sent.

Should I provide a configuration option to use no daemon threads? I don't see any other option currently and are open for ideas.

@tkowalcz
Copy link
Owner

tkowalcz commented Feb 26, 2023

Hi @lanyuanxiaoyao, @eugenwintersberger, @theaob!

I have released a new version of Tjahzi for Logback and Log4j2 that has option to use daemon threads. This should fix your problem but may cause messages to be lost during shutdown. Alternatively you can manyally shutdown the Spring logging system.

See: https://github.com/tkowalcz/tjahzi/tree/master/log4j2-appender#usedaemonthreads-optional-default--false.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants