Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ws4py remarks #20

Open
Lawouach opened this issue Jun 17, 2012 · 20 comments
Open

ws4py remarks #20

Lawouach opened this issue Jun 17, 2012 · 20 comments
Assignees

Comments

@Lawouach
Copy link

Hi there, ws4py's author here.

Thanks for the benchmark, though there can always be some concerns over their design, environment and execution, I find them useful and interesting nonetheless.

Just a couple of remarks for posterity:

  • ws4py was initially designed as a playground for implementing WebSocket in a specific way (using generators in Python). It wasn't implemented with high number of connections in mind. I thought this could be implemented gradually afterwards. I'm not surprised it didn't do that well yet.
  • ws4py runs much faster on PyPy, do you think it'd be possible for you to test that configuration as well/instead?
  • Here are a few test results comparing Tornado and Autobahn on my box along side ws4py. http://www.defuze.org/oss/ws4py/testreports/servers/0.2.1/

Thanks,

@jlouis
Copy link
Contributor

jlouis commented Jun 17, 2012

Do you have any idea why you are seeing all those connection timeouts? I wonder why that happens - perhaps it is the TCP accept() backlog default of 128 which is causing trouble here. So when you are having trouble keeping up with the backlog just once in a while, your connection timeouts increase wildly.

@Lawouach
Copy link
Author

I'll admit, I've never loaded ws4py that much so it's only guesses, specially with the fact I don't usually run the gevent implementation but rather the CherryPy/good ol' threads server. However, you are probably right, the backlog likely fills up quickly and I would definitely increase it. Skimming through gevent's code, the backlog seems to be at 50 by default on a stream server.

I would really need to profile ws4py to understand where it spends most of its time. I know that the (un)masking is actually heavy on the process all things considered, but here the data sent is so tiny it shouldn't hurt the results.

Looking at the reports I've linked above, I'd also be very interested if if the benchmark was executed with pypy. I have no doubt ws4py could do better if I could find the time to work on it more.

@jlouis
Copy link
Contributor

jlouis commented Jun 17, 2012

For Python it can't be GC unless it is the cycle detector blocking the VM. So my guess is that the system can't keep up with the load, overflows the backlog queue and then stuff begins timing out. Increasing the queue will stop timeouts, but it will then also make latencies worse all over the place.

@ericmoritz
Copy link
Owner

I would really like to figure out why these servers drop connections like they do. For little else to determine what the optimal configuration for each platform is.

@Lawouach
Copy link
Author

You might want to start increasing the socket backlog. In your ws4py runner, just add backlog=XYZ to the WebSocketServer(...) call.

server = WebSocketServer(('', 8000), backlog=128, websocket_class=EchoServer)

@perone
Copy link

perone commented Jun 17, 2012

I really think you should increase the net.core.somaxconn parameter of your setup, this could be the cause of the timeouts. It would be nice to check your syslog to verify if it isn't sending tcp cookies too, syncookies can cause timeouts and disconnections in benchmarks like that.

@ericmoritz
Copy link
Owner

@perone Excellent, I'll try that soon. It's not hard to get the timeouts on the other platforms, they occur very early in the test. I'll create a gist of the syslog for you to look at as well.

@ericmoritz
Copy link
Owner

I get this in the syslog on the server:

Jun 17 21:03:39 ip-10-36-118-97 kernel: [1125015.358550] TCP: Possible SYN flooding on port 8000. Sending cookies.  Check SNMP counters.

@ericmoritz
Copy link
Owner

@jlouis What puzzles me about the timeouts is that Erlang seems to be immune to them while the others. In fact in the most recent benchmark, Go hit 10,000 clients as well. I haven't had a chance to summarize the event data yet but the meminfo files have the connection counts for each server:

https://github.com/ericmoritz/wsdemo/tree/eleveldb-logging/results

Wouldn't an untuned TCP stack affect all the servers equally?

@jlouis
Copy link
Contributor

jlouis commented Jun 17, 2012

You can set the backlog when you open the listen-socket, which is one thing to bear in mind. Another point is that if your Erlang code has plenty of available processes waiting in accept state, then there is no backlog introduced at all since there is an accepting process you can pair off the incoming connection with. I bet your code will spawn a new accepting process and that this process will call gen_tcp:accept(LSock) fairly quickly, thus establishing a 0 backlog scenario.

Say you start with 1000 of these processes. Then your backlog is, practically, 1000+D where D is the default. If the python system runs with a single accepting loop say, then surely your backlog is at most 1+D. In effect, Erlang can now tolerate a way higher amount of quick connections since there is a process to absorb it. Whereas you will quickly see timeouts in python because the only thing the kernel can do is to drop connection attempts under the assumptions that the python process is under heavy pressure.

This also gives a plausible explanation as to why the behaviour is different. But you should really check my hypothesis by reading code :)

@ericmoritz
Copy link
Owner

Keep in mind that the kernel is not triggering this timeout. It is the TCP client in my erlang client. I set the connection timeout to 2 seconds to determine if the server was unavailable. I had no way to tell if a small number of successful clients were due to an error on my part or if the server became unavailable. There is no timeout if the TCP connection was accepted.

@jlouis
Copy link
Contributor

jlouis commented Jun 18, 2012

Ah! So it is a question of semantics then. The problem is, to the best of my knowledge, that the server can't keep up within the 2 second timeframe then. This means it answers in some value above 2000 ms but at that point the client has already registered the connection as a lost one.

If we graph the kernel density of response times, we can glean and see if that could be the case.

@ericmoritz
Copy link
Owner

Do you have enough data to graph the kernel density?

@jlouis
Copy link
Contributor

jlouis commented Jun 18, 2012

More than! The problem is that I have too much :)

@ericmoritz
Copy link
Owner

@Lawouach I'm trying to run ws4py using pypy. How did you run it? Did you use cherrypy or gevents?

@Lawouach
Copy link
Author

Yes I did. Gevent doesn't run on PyPy IIRC. I used CherryPy 3.2.2 and PyPy 1.8.

@ericmoritz
Copy link
Owner

Someone just submitted code to run tornado under pypy. If you're still curious I'll write an implementation using ws4py and cherrypy.

I also wonder if anyone has written a ws server or a http server using pypy's native greenlets module. Perhaps gunicorn?

@Lawouach
Copy link
Author

Not that I'm aware of but that'd be interesting indeed. Regarding CherryPy and ws4py, you may simply use this code:

https://github.com/Lawouach/WebSocket-for-Python/blob/master/test/autobahn_test_servers.py#L4

That worked just fine with CP 3.2.2. and PyPy 1.8 (didn't try with more recent releases).

You may want to remove the two lines about logging (l28/29) which are not relevant to the test.

Also you may want to add following config settings:

'server.thread_pool': 128
'server.socket_queue_size': 128

To cherrypy.config.update(...)

@Lawouach
Copy link
Author

I guess that'd be better if I could submit a pull-request for it but I won't have the time before tomorrow or even this week-end unfortunately :/

@ericmoritz
Copy link
Owner

I'll write up a simple server and submit a pull request that you can take a glance at. I wrote one yesterday based on your echo server but I think I deleted. I know it didn't take very long.

@ghost ghost assigned ericmoritz Jun 22, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants