Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stress testing send_receive.c fails #40

Closed
pthreat opened this issue Jan 1, 2022 · 10 comments
Closed

Stress testing send_receive.c fails #40

pthreat opened this issue Jan 1, 2022 · 10 comments
Labels

Comments

@pthreat
Copy link

pthreat commented Jan 1, 2022

I stress tested the send_receive example with artillery.io

--------------------------------------
Metrics for period to: 10:55:20(-0300) (width: 9.004s)
--------------------------------------

vusers.created_by_name.0: ................................... 80
vusers.created.total: ....................................... 80
vusers.failed: .............................................. 24
vusers.completed: ........................................... 56
vusers.session_length:
  min: ...................................................... 1.2
  max: ...................................................... 12.4
  median: ................................................... 5.9
  p95: ...................................................... 10.3
  p99: ...................................................... 10.9
websocket.send_rate: ........................................ 29/sec
websocket.messages_sent: .................................... 224
errors.Parse Error: Expected HTTP/: ......................... 24

I have tried to increase the amount of clients and ports and recompiling the example, however it still fails.

Do I need to augment file descriptors on my system in order to allow many clients sending and receiving at the same time?

PS: I like the interface of this library it's very intuitive, great work!

@pthreat
Copy link
Author

pthreat commented Jan 3, 2022

Will enable debugging today to see if I can get more information about what's causing the errors

@pthreat
Copy link
Author

pthreat commented Jan 3, 2022

Got nothing useful out of enabling debug, I still see the output addr being set to null for some reason, I have increased clients to 200000 and also ports, no idea what's going on

@Theldus
Copy link
Owner

Theldus commented Jan 4, 2022

Hi @pthreat,
This is curious, could you tell me how you performed these tests? I'm not familiar with artillery.io.

AFAIR, wsServer manages to pass in Autobahn's massconnect mode (for something like 1k clients), so I can't say exactly what's going on.

Perhaps the "vusers.failed" occurs due to some kind of timeout, because wsServer takes a while to respond or something like that (since wsServer is very simple in connection management, etc).

Got nothing useful out of enabling debug, I still see the output addr being set to null for some reason, I have increased clients to 200000 and also ports, no idea what's going on

200k clients is indeed a massive amount of clients, and I would be very surprised if wsServer could manage this, since wsServer is not async and uses one thread per client.

If you need a server for such a large number of clients, maybe wsServer is not the right server for you =/.

@pthreat
Copy link
Author

pthreat commented Jan 5, 2022

@Theldus Hey man, glad to see you around, I'll paste the test script:

Put this in stress.yml

config:
  target: "ws://127.0.0.1:8080"
  phases:
    - duration: 100
      arrivalRate: 8
  ws:
    # Ignore SSL certificate errors
    # - useful in *development* with self-signed certs
    rejectUnauthorized: false
scenarios:
  - engine: "ws"
    flow:
      - send: '{"event":"hello", "token":"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "application":"frontend"}'
      - send: '{"event":"hello", "token":"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "application":"frontend"}'
      - send: '{"event":"hello", "token":"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "application":"frontend"}'
      - send: '{"event":"hello", "token":"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "application":"frontend"}'

Install artillery (through node's npm)

npm -g install artillery

Then:

artillery run stress.yml

@pthreat
Copy link
Author

pthreat commented Jan 5, 2022

FWIW I have tried node https://github.com/websockets/ws and tests showed up no problems.

@Theldus
Copy link
Owner

Theldus commented Jan 7, 2022

Hi @pthreat,
Thanks for the detailed instructions. I was able to configure Artillery and see results similar to yours. I didn't get such a high error rate but I constantly get "errors.Parse Error: Expected HTTP/:" in the output.

I ran artillery against Autobahn's echoserver and also against websocat, and both without any errors, which really implies something is happening with wsServer.

Running artillery in debug mode (DEBUG=ws* artillery run stress.yml), sometimes I get the following output:

2022-01-07T02:14:51.636Z ws Error: Parse Error: Expected HTTP/
    at Socket.socketOnData (_http_client.js:515:22)
    at Socket.emit (events.js:376:20)
    at addChunk (internal/streams/readable.js:309:12)
    at readableAddChunk (internal/streams/readable.js:284:9)
    at Socket.Readable.push (internal/streams/readable.js:223:10)
    at TCP.onStreamRead (internal/stream_base_commons.js:188:23)

I would like to take a closer look at this stacktrace, but my knowledge of Node is pretty minimal.

Reading here, it seems to me to be a problem that occurs in the HTTP parsing, probably in the handshake response, since this exact message can occur in other contexts, outside of Artillery.

I have the feeling that if I analyze via Wireshark I could find out what the 'bad response' of wsServer is, the problem is locating, as it is quite random and occurs in small quantity.

@pthreat
Copy link
Author

pthreat commented Jan 13, 2022

@Theldus Sorry, got lost for a few days, would be awesome if a high load would be manageable, I mean if node can manage it so can wsServer ;) . I do know artillery plays dirty doesn't even sends pong or whatever, it plays dirty and that's why I think it's a good stress testing tool. Hope you can crush the issue :)

@Theldus
Copy link
Owner

Theldus commented Jan 14, 2022

@pthreat No worries =),
Yes, I care a lot about wsServer being 'correct' (which is why there are automated fuzzying tests with Autobahn and AFL ;-)), so another tool like Artillery looks pretty interesting to me.

I do know artillery plays dirty doesn't even sends pong or whatever [...]

PING/PONG frames shouldn't exactly be a problem, wsServer works fine without them. What wsServer expects to correctly receive is a valid handshake, as well as valid message frames (TXT, BIN, CONT...). Abnormal closure (no close handshake) also works.

What intrigues me the most is that the error is random: sometimes it happens, sometimes it doesn't, which makes debugging very difficult.

But I will continue investigating here and I will keep you informed about any progress ;-).

Theldus added a commit that referenced this issue Jan 24, 2022
This commit fixes issue #40 where the Artillery tool was able to
force a race condition that occurred on the wsServer (a) and also
the premature sending of broadcast messages (b).

In detail:
a) There was a race condition during the disconnect process, where
the 'client_socks' structure was not fully secured with locks, and
wsServer tried to send messages to already disconnected clients.

b) During the broadcast of messages, the wsServer did not check if
the connected clients were already in the 'OPEN' state, that is,
if they had already performed the handshake, which made the server
send messages prematurely to the clients.

Fixed these two issues, now wsServer is able to pass Artillery tests
without problems as described in the issue.

Signed-off-by: Davidson Francis <[email protected]>
Theldus added a commit that referenced this issue Jan 24, 2022
This commit fixes issue #40 where the Artillery tool was able to
force a race condition that occurred on the wsServer (a) and also
the premature sending of broadcast messages (b).

In detail:
a) There was a race condition during the disconnect process, where
the 'client_socks' structure was not fully secured with locks, and
wsServer tried to send messages to already disconnected clients.

b) During the broadcast of messages, the wsServer did not check if
the connected clients were already in the 'OPEN' state, that is,
if they had already performed the handshake, which made the server
send messages prematurely to the clients.

Fixed these two issues, now wsServer is able to pass Artillery tests
without problems as described in the issue.

Signed-off-by: Davidson Francis <[email protected]>
@Theldus Theldus added the bug label Jan 24, 2022
@Theldus
Copy link
Owner

Theldus commented Jan 24, 2022

Hi @pthreat,
Commit d66b613 fixes this issue.

There were two things happening:
a) There was a race condition during the disconnect process, where the 'client_socks' structure was not fully protected with locks, and wsServer tried to send messages to already disconnected clients.

b) During the message broadcast (send_receive.c example does), wsServer did not check if the connected clients were already in the 'OPEN' state, that is, if they had already performed the handshake, which made the server send messages prematurely to the clients (hence the Parse Error: Expected HTTP/ error).

Thanks for noticing this, race conditions are generally very difficult to spot and fix, without Artillery this error would remain unnoticed for a long time.

Now I get the following output with Artillery:

--------------------------------
Summary report @ 04:22:23(-0300)
--------------------------------

vusers.created_by_name.0: ................................... 800
vusers.created.total: ....................................... 800
vusers.completed: ........................................... 800
vusers.session_length:
  min: ...................................................... 0.9
  max: ...................................................... 11.1
  median: ................................................... 1.3
  p95: ...................................................... 1.9
  p99: ...................................................... 3.4
websocket.send_rate: ........................................ 32/sec
websocket.messages_sent: .................................... 3200

Please let me know if everything is ok so I can close the issue, and again, thank you very much for spotting this =).

@Theldus Theldus closed this as completed Apr 19, 2022
@pthreat
Copy link
Author

pthreat commented Apr 25, 2022

Hey Theldus sorry I didn't reply earlier. Will check it out when I have sometime trying to stress it! Results look promising. Thanks for fixing this <3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants