Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HiveMQ CE Subscribers Become Idle Under Heavy Load with Shared Subscriptions and QoS=AtLeastOnce #558

Open
VladimirMakarevich opened this issue Jan 2, 2025 · 6 comments
Labels
bug Something isn't working

Comments

@VladimirMakarevich
Copy link

VladimirMakarevich commented Jan 2, 2025

Expected behavior

Subscribers should consistently receive messages as long as the connection is active and the broker has messages on the subscribed topics, regardless of load conditions.

Under scenarios involving high message throughput and frequent topic publishing/subscribing, we observe that subscriber clients stop receiving messages from the HiveMQ CE broker at some point. Notably, the connection remains established and there are no any errors and warnings in the logs, but message flow ceases. When the consumer client is restarted, messages are temporarily resumed, but after some time they stop again. This behavior is specifically observed when using MqttQualityOfServiceLevel = AtLeastOnce in combination with shared subscriptions.

Actual behavior

After an initial period of normal message flow, subscribers become idle and do not receive any further messages until they are restarted.

To Reproduce

Steps

  1. Set up a HiveMQ broker (either default or custom configuration).
  • docker run -d --name hivemq_ce_latest_8g -p 8080:8080 -p 1883:1883 -e JAVA_OPTS="-Xmx8g" hivemq/hivemq-ce:latest
  1. Launch 20 publisher clients with cleanStart = false and protocolVersion = 5, publishing messages to 5 shared topics at a high rate.
  1. Launch a single subscriber client (in .NET, JS, or Python) with cleanStart = false and protocolVersion = 5, subscribing to the same 5 shared topics using QoS=AtLeastOnce.
  2. Observe that after some time the subscriber stops receiving messages (even though the connection remains active and there are no errors).

Reproducer code

How to run the .NET Publishers and Consumer

Details

What We Have Observed & Collected:

  1. Reproduction Code & Setup:
  • We have shared code samples and a repository link demonstrating how to reproduce this issue.
    • We have three different consumer clients written on three different technologies, and the issue is reproducible in all cases.
      • NOTE: Our main stack is C# with .NET Core.
  • We have a screen-recording video illustrating the steps to trigger the behavior.
  1. Broker Configurations:
  • We have tested with both a default HiveMQ CE broker configuration and a custom configuration. The issue occurs in both scenarios.
  1. Logs & Diagnostics:
  1. Testing Environment & Components:
  • Publishers:

    • Implemented in .NET
    • Connection settings: cleanStart = false, protocolVersion = 5
    • Number of client publishers: 20
    • Topics: 5 shared subscriptions
  • Subscribers:

    • Implemented using .NET, JavaScript, and Python clients
    • Connection settings: cleanStart = false, protocolVersion = 5
    • Number of subscribers: 1
    • Subscriptions: 5 shared subscriptions
  1. When Does It Occur?
  • When publishing and reading messages at high rates.
  • Under heavy load conditions, like during resource-constrained conditions on the broker’s hosting environment (e.g., VM memory and CPU nearing limits).
  1. Additional Testing:
  • This issue does not occur in HiveMQ Enterprise Broker.
  • This issue does not occur with the default Mosquitto Broker.
@sauroter
Copy link
Member

Hi @VladimirMakarevich,

Thank you for bringing this to our attention and for providing such comprehensive materials alongside the issue - it’s greatly appreciated.

We will attempt to reproduce this issue internally. If successful, we’ll prioritize and schedule a bug fix. Should we encounter any difficulties reproducing the issue, we may reach out to you for additional clarification or details.

Please note that high-load and demanding use cases are best suited for the HiveMQ Enterprise Edition, which offers advanced features designed to ensure enterprise-grade reliability, even under the most challenging circumstances.

Thank you again, and have a great day!

Best regards,
Georg

@sauroter sauroter added the bug Something isn't working label Jan 14, 2025
@pesetskyps
Copy link

I am working with Vladimir and also currently our client that has this deployment and where identified the issue also brought it up with HiveMQ representatives and they agreed to help to support with this issue. Please feel free to ping us and we can set up a call we can show the issue there live.

@FinnHMQ
Copy link

FinnHMQ commented Jan 15, 2025

Hi @VladimirMakarevich & @pesetskyps,

I had reproducer active for over an hour at a time and for multiple shorter sessions, but was unable to observe the behaviour you captured in the video you shared.

It would be interesting to see what the broker reports from its side. Can I ask you to inspect event.log for mentions of your subscribing client? In my own tests, I see dropped messages, as the 20 to 1 ratio is overwhelming the single subscriber, but generally message flow never comes to a halt.

As we have observed similar behaviour in the past with some libraries, can I ask you to replace the subscribing client with an instance of HiveMQ's MQTT CLI?

Kind regards,
Finn

@pesetskyps
Copy link

Sure @FinnHMQ we will do that. For event logs you can check the logs that Vladimir attached URL to, it has all possible logs that we would capture from HiveMQ including event.log
https://github.com/VladimirMakarevich/hivemq-ce-idle-issue/tree/main/diagnostics%20and%20logs

If you can't reproduce the behaviour on your side I recommend us having a call, because we can show you the issue live. But let us check what you've asked first

@VladimirMakarevich
Copy link
Author

VladimirMakarevich commented Jan 15, 2025

Hi @FinnHMQ. Thank you for your suggestions.

I tested using HiveMQ's MQTT CLI as you recommended. During my attempts to reproduce the issue, I periodically encountered the following error:

QoS 1 PUBLISH must not be resent during the same connection

After which the consumer switched to idle behavior. But most likely consumer just disconnects from the broker in this case.

I have created a screencast demonstrating this error and collected all relevant broker logs. You can access them here: hivemq-ce-idle-issue/diagnostics and logs/v4.

I'm unsure whether this error is related to the issues we're experiencing with the python aiomqtt library, the js mqtt library, or the .net mqttnet library, or if it indicates a new issue.

Please let me know if you need any additional information.

Best regards,
Vladimir Makarevich

@FinnHMQ
Copy link

FinnHMQ commented Jan 17, 2025

Hi @VladimirMakarevich,

while I still wasn't able to reproduce the behaviour directly with your shared code, the data you shared (especially the run with the MQTT CLI) gave us valuable insights. The disconnect by the consumer hints at a protocol error occurring and the broker erroneously duplicating a message.

We have made progress with our test suite regarding the behaviour close to end of day yesterday. I will keep you updated with our findings. Thank you for bringing this to our attention.

Kind regards,
Finn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants