-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect Routing of Upstream Packets (Two clients and one game server) #988
Comments
Thank you for your issue! Would you mind testing one of the latest images and see if you can reproduce? Just to eliminate the possibility that it has already been fixed. |
@XAMPPRocky Thanks for your reply. We would appreciate it if there is a latest image available. Edited: |
You can also grab from one of our PR builds, e.g #987 (comment) |
@XAMPPRocky We have tried using the latest image, but unfortunately, the issue persists. Additionally, the CPU and memory usage are much higher than in version v0.8.0. 🥲 |
@zezhehh That is odd, because we have used and tested this setup of having one token per gameserver with multiple clients on a single proxy and haven't had an issue at all. Would you be able to check your load balancer setup that you're running infront of the proxy, my first reckon is with that as we don't put any load balancers in front of the proxies, so that's one difference I see with the setup we have tested. |
@zezhehh can you share what kind of LB it is? (i.e. is it a Google Cloud / AWS LoadBalancer? How is it configured etc?) Maybe there is something in there. |
We have a GCP k8s LB setting as follows, which should preserve the original source
The issue has been confirmed by examining dumped UDP traffic in the pcap file, which can be viewed using Wireshark. Quilkin Proxy Capture.pcap.zip The file contains data from two ongoing games. The Quilkin proxy is identified as In the first game, involving clients However, in the second game, involving clients An example request can be identified with correlation_id:
For visual reference: |
To clarify the point a little further before I start going over unit tests and seeing if I can replicate this in one.
|
Note: We also observed that the problematic port is likely the same one ( |
Hmnn, not 100% sure I followed that. I need to double check the code, because I know this got optimised a while back (not by me, so I'm not as familiar anymore) so we could handle way more endpoints per proxy , but I'm fairly sure, it should be: I.e. for each client connecting, there should be a different port the gameserver connects to to send packets back. If it's the same port, I'm not sure how we differentiate which packet should go where 🤔 are you saying there is only one quilkin proxy port being used by the gameserver process? |
If it's one port, I'm going to reckon that the load balancer is not preserving the IP:port of the client when sending traffic to the proxy, so to the proxy it looks like a single client. |
@markmandel @XAMPPRocky Yes, we understand that the issue lies in the fact that the proxy uses the same port for two clients communicating with the game server. However, it is a symptom and not something we intentionally did (we didn’t change the source codes to remap the socket usage). What we can confirm is:
(The No.2 and 3 can be observed in the dumped UDP traffic. |
@zezhehh to clarify I mean that I'm not sure that the load balancer is always providing a unique ip:port pair. Not that you've made a change but however the load balancer works / is configured it is not always sending unique addresses. Would you be able to test this with a NodePort for proxy traffic instead? I think cutting out the load balancer will help us determine if you can replicate it with direct traffic to the proxy. |
That wasn't what I was getting at. I was getting at the traffic from the proxy to the game server should be over 2 different ports at Is that what you are seeing?
It seems like you are... but that leaves me extremely confused, because then ALL traffic back to clients would only go to one client. Without a different port on the proxy for each connection to the game server and back -- there's no way to differentiate where the traffic should head back to. |
Hmm.. I don't think the Load Balancer is the issue here. We have
|
Sorry for any confusion. I'll try to make it clear.
Yes, it's what we're seeing. The game server records the client socket (which is actually the proxy socket) for each client, identified by user ID, and responds to this specific socket. In other words, the game server doesn’t check for conflicts with other sockets but simply sends the response to the originating socket. This situation only occurs when the error arises. In most cases, everything functions as expected: two clients from two sockets... |
Got it - thanks. Also, I assume there more than one endpoints in play at this point as well? (just to replicate as close as we can in a unit test to see if we can replicate). |
@markmandel Yes, those automated testing matches occur within the same cluster as the real matches. |
This is an integration test to ensure that concurrent clients to the same proxy and endpoint didn't mix packets. Could not replicate the reported issue below, but it felt like a good test to have for concurrency testing. Work on googleforgames#988
I finally got some time to look into this - check out the test I wrote in #1010 -- unfortunately I could not replicate any of your reported issues Would love you to look at the test though, see if there is something else to the scenario that I didn't manage to capture in the integration test. Let me know if you see anything. |
This is an integration test to ensure that concurrent clients to the same proxy and endpoint didn't mix packets. Could not replicate the reported issue below, but it felt like a good test to have for concurrency testing. Work on #988
hmm.. Could you try to allocate the clients with the same ip and different ports? |
Unless you mean something else, the unit test has two sockets on the same IP (localhost) but different ports -- so I believe this tests this scenario, unless I am misunderstanding? |
Okay then all good. Thanks! We have some other different setup (same tokens from clients, etc.), but let's talk tomorrow! :) |
Just for easy discovery, assuming there's an issue in Quillkin, it's likely one of these spots:
So weird. |
What happened:
Hi all,
We've observed an issue that a proxy pod uses the same socket for traffic from different clients to the same game server, resulting in one of the clients not receiving response from the game server? Have any specific edge cases been identified as causing this issue?
[Our architecture setup]
In the game server kubernetes cluster, we have a Load Balancer that routes to multiple proxy pods (not as sidecars) and control planes with the Agones provider. We’re using the same token for both clients.
What you expected to happen:
We expect two clients can receive corresponding responses.
How to reproduce it (as minimally and precisely as possible):
Unknown. Once it started to occur at a some point, it started happening intermittently throughout the day. We suspect there may be a buggy state in a specific pod instance.
Anything else we need to know?:
Environment:
Quilkin version:
v0.8.0
Execution environment (binary, container, etc):
kubernetes, container
Operating system:
Custom filters? (Yes/No - if so, what do they do?):
The text was updated successfully, but these errors were encountered: