-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ws4py remarks #20
Comments
Do you have any idea why you are seeing all those connection timeouts? I wonder why that happens - perhaps it is the TCP accept() backlog default of 128 which is causing trouble here. So when you are having trouble keeping up with the backlog just once in a while, your connection timeouts increase wildly. |
I'll admit, I've never loaded ws4py that much so it's only guesses, specially with the fact I don't usually run the gevent implementation but rather the CherryPy/good ol' threads server. However, you are probably right, the backlog likely fills up quickly and I would definitely increase it. Skimming through gevent's code, the backlog seems to be at 50 by default on a stream server. I would really need to profile ws4py to understand where it spends most of its time. I know that the (un)masking is actually heavy on the process all things considered, but here the data sent is so tiny it shouldn't hurt the results. Looking at the reports I've linked above, I'd also be very interested if if the benchmark was executed with pypy. I have no doubt ws4py could do better if I could find the time to work on it more. |
For Python it can't be GC unless it is the cycle detector blocking the VM. So my guess is that the system can't keep up with the load, overflows the backlog queue and then stuff begins timing out. Increasing the queue will stop timeouts, but it will then also make latencies worse all over the place. |
I would really like to figure out why these servers drop connections like they do. For little else to determine what the optimal configuration for each platform is. |
You might want to start increasing the socket backlog. In your ws4py runner, just add backlog=XYZ to the WebSocketServer(...) call. server = WebSocketServer(('', 8000), backlog=128, websocket_class=EchoServer) |
I really think you should increase the net.core.somaxconn parameter of your setup, this could be the cause of the timeouts. It would be nice to check your syslog to verify if it isn't sending tcp cookies too, syncookies can cause timeouts and disconnections in benchmarks like that. |
@perone Excellent, I'll try that soon. It's not hard to get the timeouts on the other platforms, they occur very early in the test. I'll create a gist of the syslog for you to look at as well. |
I get this in the syslog on the server:
|
@jlouis What puzzles me about the timeouts is that Erlang seems to be immune to them while the others. In fact in the most recent benchmark, Go hit 10,000 clients as well. I haven't had a chance to summarize the event data yet but the meminfo files have the connection counts for each server: https://github.com/ericmoritz/wsdemo/tree/eleveldb-logging/results Wouldn't an untuned TCP stack affect all the servers equally? |
You can set the backlog when you open the listen-socket, which is one thing to bear in mind. Another point is that if your Erlang code has plenty of available processes waiting in accept state, then there is no backlog introduced at all since there is an accepting process you can pair off the incoming connection with. I bet your code will spawn a new accepting process and that this process will call gen_tcp:accept(LSock) fairly quickly, thus establishing a 0 backlog scenario. Say you start with 1000 of these processes. Then your backlog is, practically, This also gives a plausible explanation as to why the behaviour is different. But you should really check my hypothesis by reading code :) |
Keep in mind that the kernel is not triggering this timeout. It is the TCP client in my erlang client. I set the connection timeout to 2 seconds to determine if the server was unavailable. I had no way to tell if a small number of successful clients were due to an error on my part or if the server became unavailable. There is no timeout if the TCP connection was accepted. |
Ah! So it is a question of semantics then. The problem is, to the best of my knowledge, that the server can't keep up within the 2 second timeframe then. This means it answers in some value above 2000 ms but at that point the client has already registered the connection as a lost one. If we graph the kernel density of response times, we can glean and see if that could be the case. |
Do you have enough data to graph the kernel density? |
More than! The problem is that I have too much :) |
@Lawouach I'm trying to run ws4py using pypy. How did you run it? Did you use cherrypy or gevents? |
Yes I did. Gevent doesn't run on PyPy IIRC. I used CherryPy 3.2.2 and PyPy 1.8. |
Someone just submitted code to run tornado under pypy. If you're still curious I'll write an implementation using ws4py and cherrypy. I also wonder if anyone has written a ws server or a http server using pypy's native greenlets module. Perhaps gunicorn? |
Not that I'm aware of but that'd be interesting indeed. Regarding CherryPy and ws4py, you may simply use this code: https://github.com/Lawouach/WebSocket-for-Python/blob/master/test/autobahn_test_servers.py#L4 That worked just fine with CP 3.2.2. and PyPy 1.8 (didn't try with more recent releases). You may want to remove the two lines about logging (l28/29) which are not relevant to the test. Also you may want to add following config settings: 'server.thread_pool': 128 To cherrypy.config.update(...) |
I guess that'd be better if I could submit a pull-request for it but I won't have the time before tomorrow or even this week-end unfortunately :/ |
I'll write up a simple server and submit a pull request that you can take a glance at. I wrote one yesterday based on your echo server but I think I deleted. I know it didn't take very long. |
Hi there, ws4py's author here.
Thanks for the benchmark, though there can always be some concerns over their design, environment and execution, I find them useful and interesting nonetheless.
Just a couple of remarks for posterity:
Thanks,
The text was updated successfully, but these errors were encountered: