Skip to content

feat: gateway timeout#369

Open
UlyssesZh wants to merge 1 commit intoshardlab:mainfrom
UlyssesZh:timeout
Open

feat: gateway timeout#369
UlyssesZh wants to merge 1 commit intoshardlab:mainfrom
UlyssesZh:timeout

Conversation

@UlyssesZh
Copy link

Summary

While Discordrb has been working flawlessly on my machine, after I moved to another place, it starts seeing errors like tihs frequently:

[WARN : heartbeat @ 2025-10-15 20:45:43.756] Last heartbeat was not acked, so this is a zombie connection! Reconnecting
[ERROR : heartbeat @ 2025-10-15 20:45:43.756] The websocket connection has closed: (no information)
[ERROR : websocket @ 2025-10-15 21:00:53.264] An error occurred in the main websocket loop!
[ERROR : websocket @ 2025-10-15 21:00:53.264] Exception: #<Errno::ETIMEDOUT: Connection timed out>
[ERROR : websocket @ 2025-10-15 21:00:53.264] /usr/local/lib/ruby/3.4.0/openssl/buffering.rb:160:in 'OpenSSL::SSL::SSLSocket#sysread'
[ERROR : websocket @ 2025-10-15 21:00:53.264] /usr/local/lib/ruby/3.4.0/openssl/buffering.rb:160:in 'OpenSSL::Buffering#readpartial'
[ERROR : websocket @ 2025-10-15 21:00:53.264] /usr/local/bundle/gems/discordrb-3.5.0/lib/discordrb/gateway.rb:602:in 'Discordrb::Gateway#websocket_loop'
[ERROR : websocket @ 2025-10-15 21:00:53.264] /usr/local/bundle/gems/discordrb-3.5.0/lib/discordrb/gateway.rb:579:in 'Discordrb::Gateway#connect'
[ERROR : websocket @ 2025-10-15 21:00:53.264] /usr/local/bundle/gems/discordrb-3.5.0/lib/discordrb/gateway.rb:473:in 'block in Discordrb::Gateway#connect_loop'
[ERROR : websocket @ 2025-10-15 21:00:53.264] <internal:kernel>:168:in 'Kernel#loop'
[ERROR : websocket @ 2025-10-15 21:00:53.265] /usr/local/bundle/gems/discordrb-3.5.0/lib/discordrb/gateway.rb:472:in 'Discordrb::Gateway#connect_loop'
[ERROR : websocket @ 2025-10-15 21:00:53.265] /usr/local/bundle/gems/discordrb-3.5.0/lib/discordrb/gateway.rb:168:in 'block in Discordrb::Gateway#run_async'
[INFO : websocket @ 2025-10-15 21:00:53.265] Instant reconnection flag was set - reconnecting right away
[INFO : websocket @ 2025-10-15 21:00:54.008] Discord using gateway protocol version: 9, requested: 9

Notice that the time between the two errors is 15min 10s (and it always is whenever this happens), which is a pretty long time for a Discord bot to be down. This timeout threshold is probably set somewhere in /proc/sys/net/ipv4/tcp_*(and nowhere else sets a timeout, so it will otherwise just hang indefinitely), but I am not sure. Anyway, the workaround that I have come up with requires some modification to Discordrb which I have done in this PR. It should not affect any existing users, but will help in this particular case where the socket becomes zombie before the first hello message.

I am not sure whether the 2s initial timeout I set here is too short or not, but I think it will not normally be exceeded even in very poor network conditions. If it is too short, I can make it to be configurable by an environment variable so that people can set this according to the network condition.

Added

Sets a timeout on the socket connected to Discord gateway.

Changed

Deprecated

Removed

Fixed

@swarley swarley requested a review from Copilot October 16, 2025 12:12
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds configurable socket timeout logic to detect and recover more quickly from zombie gateway connections prior to receiving the initial Discord hello, reducing long blocking periods.

  • Introduces set_socket_timeout helper and applies an initial 2s timeout before handshake.
  • Adjusts timeout after hello to heartbeat interval + 1 and adds rescue for IO::TimeoutError in websocket loop.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@swarley
Copy link
Member

swarley commented Oct 16, 2025

Sorry please ignore the clanker it was a misclick on mobile

@UlyssesZh UlyssesZh force-pushed the timeout branch 2 times, most recently from 676c866 to 3a039c9 Compare October 17, 2025 18:54
@UlyssesZh UlyssesZh changed the title fix: gateway timeout feat: gateway timeout Oct 17, 2025
@UlyssesZh UlyssesZh force-pushed the timeout branch 2 times, most recently from bcf8bc9 to 684c113 Compare October 17, 2025 19:01
@UlyssesZh
Copy link
Author

Sorry I should've run rubocop myself.

@Droid00000 Droid00000 added the enhancement New feature or request label Oct 23, 2025
@Droid00000 Droid00000 self-assigned this Oct 23, 2025
Copy link
Collaborator

@Droid00000 Droid00000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I am a fan of this, but we should make this something that's opt-in. Like the check_heartbeat_acks= setter

@UlyssesZh
Copy link
Author

Personally, I am a fan of this, but we should make this something that's opt-in. Like the check_heartbeat_acks= setter

Basically I rewrote the behavior of check_heartbeat_acks. The user opts out by setting it to false.

@Droid00000 Droid00000 assigned swarley and unassigned Droid00000 Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants