Skip to content

Kafka Producer stuck in a loop with SQLExceptionΒ #2224

@rpeternella-jimdo

Description

@rpeternella-jimdo

Hi Maxwell team,

We encountered an issue trying to run the boostrapping for our MySQL databases, which doesn't provide us with enough information to properly debug, so would be really helpful to get your insights on this.

Our setup uses Maxwell to stream data from our MySQL DBs into AWS MSK, and we are enabling bootstrapping by inserting the record on the maxwell.boostrap table. As soon as we add a new record, we can see the count of records going up, and eventually resetting to zero and restarting to Produce events. The process only stops when set is_complete = 1.

As I checked the logs, I was able to only find two specific things:

  1. The only ERROR is the following: BootstrapController: got SQLException trying to bootstrap; per my understanding, this is a generic error thrown by the main thread HERE.

  2. Checking our logs I can see that for every record, there are two lines that are requests/responses from Kafka:

[DEBUG] NetworkClient: [Producer clientId=producer-1] Sending PRODUCE request with header RequestHeader(apiKey=PRODUCE, apiVersion=8, clientId=producer-1, correlationId=4856781) and timeout 30000 to node 2: {acks=1,timeout=30000,partitionSizes=[maxwell-1=16341]}
[DEBUG] NetworkClient: [Producer clientId=producer-1] Received PRODUCE response from node 2 for request with header RequestHeader(apiKey=PRODUCE, apiVersion=8, clientId=producer-1, correlationId=4856777): org.apache.kafka.common.requests.ProduceResponse@59d17678

This is consistent until the last record, where we get the SQLException:

[DEBUG] NetworkClient: [Producer clientId=producer-1] Sending PRODUCE request with header RequestHeader(apiKey=PRODUCE, apiVersion=8, clientId=producer-1, correlationId=4856782) and timeout 30000 to node 2: {acks=1,timeout=30000,partitionSizes=[maxwell-creator-cms-events-1=14961]}
[ERROR] BootstrapController: got SQLException trying to bootstrap

I double checked the Kafka configuration for any issues, tried changing ack=all, increasing timeout, but nothing seems to change; to add here, this issue is happening only in our production Kafka (the boostrapping worked in Stage, because why not? πŸ˜†). As we already have it streaming the CDC events in both environment, I would rule out any networking/connectivity issues (only bootstrapping is affected).

Would you have any ideas of what could be the issue here? Happy to provide any further info if that helps.

Thanks a lot in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions