Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: experiment with callers providing buffers #23

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

mccutchen
Copy link
Owner

Not particularly happy with the specific implementation here, but want to figure out what this kind of change does to allocations. Ultimately, would like to enable using a sync.Pool of buffers when reading frames and messages from the wire, but I'm not sure what the API should look like.

Copy link

github-actions bot commented Jan 6, 2025

benchstats: 9f05ba1...65c8e2e

View full benchmark output on the workflow summary.

goos: linux
goarch: amd64
pkg: github.com/mccutchen/websocket
cpu: AMD EPYC 7763 64-Core Processor                
                       │ ./baseline/bench-results.txt │       ./head/bench-results.txt       │
                       │            sec/op            │    sec/op     vs base                │
ReadFrame/1KiB-4                         1023.5n ± 0%   830.2n ±  1%  -18.89% (p=0.000 n=10)
ReadFrame/1MiB-4                          941.7µ ± 2%   796.7µ ±  1%  -15.40% (p=0.000 n=10)
ReadFrame/8MiB-4                          7.007m ± 1%   6.367m ±  0%   -9.14% (p=0.000 n=10)
ReadFrame/16MiB-4                         14.19m ± 1%   12.82m ±  0%   -9.65% (p=0.000 n=10)
ReadMessage/1MiB/1-4                      1.079m ± 1%   1.094m ±  1%   +1.41% (p=0.000 n=10)
ReadMessage/8MiB/1-4                      7.754m ± 1%   7.932m ±  1%   +2.30% (p=0.000 n=10)
ReadMessage/16MiB/1-4                     15.55m ± 1%   15.67m ±  1%   +0.75% (p=0.003 n=10)
ReadMessage/1MiB/4-4                      1.431m ± 2%   1.869m ± 12%  +30.62% (p=0.000 n=10)
ReadMessage/8MiB/4-4                     10.477m ± 2%   9.052m ±  1%  -13.59% (p=0.000 n=10)
ReadMessage/16MiB/4-4                     19.90m ± 4%   18.04m ±  1%   -9.35% (p=0.000 n=10)
ReadMessage/1MiB/16-4                     1.981m ± 3%   2.020m ±  7%        ~ (p=0.971 n=10)
ReadMessage/8MiB/16-4                     16.14m ± 5%   13.12m ±  5%  -18.68% (p=0.000 n=10)
ReadMessage/16MiB/16-4                    23.34m ± 3%   21.50m ±  3%   -7.86% (p=0.000 n=10)
geomean                                   3.131m        2.949m         -5.80%

                       │ ./baseline/bench-results.txt │       ./head/bench-results.txt        │
                       │             B/op             │     B/op      vs base                 │
ReadFrame/1KiB-4                         1080.00 ± 0%     48.00 ± 0%   -95.56% (p=0.000 n=10)
ReadFrame/1MiB-4                      1048649.00 ± 0%     48.00 ± 0%  -100.00% (p=0.000 n=10)
ReadFrame/8MiB-4                      8388685.00 ± 0%     48.00 ± 0%  -100.00% (p=0.000 n=10)
ReadFrame/16MiB-4                    16777291.50 ± 0%     48.00 ± 0%  -100.00% (p=0.000 n=10)
ReadMessage/1MiB/1-4                     1.000Mi ± 0%   1.008Mi ± 0%    +0.78% (p=0.000 n=10)
ReadMessage/8MiB/1-4                     8.000Mi ± 0%   8.008Mi ± 0%    +0.10% (p=0.000 n=10)
ReadMessage/16MiB/1-4                    16.00Mi ± 0%   16.01Mi ± 0%    +0.05% (p=0.000 n=10)
ReadMessage/1MiB/4-4                     3.602Mi ± 0%   2.860Mi ± 0%   -20.61% (p=0.000 n=10)
ReadMessage/8MiB/4-4                     28.57Mi ± 0%   22.58Mi ± 0%   -20.97% (p=0.000 n=10)
ReadMessage/16MiB/4-4                    57.09Mi ± 0%   45.09Mi ± 0%   -21.01% (p=0.000 n=10)
ReadMessage/1MiB/16-4                    5.407Mi ± 0%   4.477Mi ± 0%   -17.20% (p=0.000 n=10)
ReadMessage/8MiB/16-4                    47.22Mi ± 0%   39.73Mi ± 0%   -15.87% (p=0.000 n=10)
ReadMessage/16MiB/16-4                   93.81Mi ± 0%   78.81Mi ± 0%   -15.98% (p=0.000 n=10)
geomean                                  5.263Mi        264.8Ki        -95.09%

                       │ ./baseline/bench-results.txt │      ./head/bench-results.txt      │
                       │          allocs/op           │ allocs/op   vs base                │
ReadFrame/1KiB-4                           5.000 ± 0%   1.000 ± 0%  -80.00% (p=0.000 n=10)
ReadFrame/1MiB-4                           5.000 ± 0%   1.000 ± 0%  -80.00% (p=0.000 n=10)
ReadFrame/8MiB-4                           5.000 ± 0%   1.000 ± 0%  -80.00% (p=0.000 n=10)
ReadFrame/16MiB-4                          5.000 ± 0%   1.000 ± 0%  -80.00% (p=0.000 n=10)
ReadMessage/1MiB/1-4                       6.000 ± 0%   3.000 ± 0%  -50.00% (p=0.000 n=10)
ReadMessage/8MiB/1-4                       6.000 ± 0%   3.000 ± 0%  -50.00% (p=0.000 n=10)
ReadMessage/16MiB/1-4                      6.000 ± 0%   3.000 ± 0%  -50.00% (p=0.000 n=10)
ReadMessage/1MiB/4-4                      24.000 ± 0%   9.000 ± 0%  -62.50% (p=0.000 n=10)
ReadMessage/8MiB/4-4                      24.000 ± 0%   9.000 ± 0%  -62.50% (p=0.000 n=10)
ReadMessage/16MiB/4-4                     24.000 ± 0%   9.000 ± 0%  -62.50% (p=0.000 n=10)
ReadMessage/1MiB/16-4                      90.00 ± 0%   27.00 ± 0%  -70.00% (p=0.000 n=10)
ReadMessage/8MiB/16-4                      90.00 ± 0%   27.00 ± 0%  -70.00% (p=0.000 n=10)
ReadMessage/16MiB/16-4                     90.00 ± 0%   27.00 ± 0%  -70.00% (p=0.000 n=10)
geomean                                    14.59        4.578       -68.63%

@mccutchen mccutchen force-pushed the callers-provide-buffers branch from ae16ac8 to e737601 Compare January 7, 2025 03:20
mccutchen added a commit that referenced this pull request Jan 22, 2025
The idea here is that there are some test cases (e.g. in the 9.* suite
that send extremely large frames/messages) where our implementation
appears to be painfully slow, so being able to run specific autobahn
tests against an external server like

    make testautobahn AUTOBAHN_CASES=9.2.6 AUTOBAHN_TARGET=localhost:8080

will hopefully make it easier to gain visibility into performance
issues. (See also #23 for one potential path towards optimization.)
@mccutchen mccutchen force-pushed the callers-provide-buffers branch 2 times, most recently from 65c8e2e to 8129de5 Compare January 22, 2025 06:31
Copy link

github-actions bot commented Jan 22, 2025

🔥 Run benchmarks comparing a275f8e against main:

gh workflow run bench.yaml -f pr_number=23

Note: this comment will update with each new commit.

Copy link

github-actions bot commented Jan 22, 2025

benchstats: 4e63a4d...a275f8e

View full benchmark output on the workflow summary.

goos: linux
goarch: amd64
pkg: github.com/mccutchen/websocket
cpu: AMD EPYC 7763 64-Core Processor                
                       │ ./baseline/bench-results.txt │
                       │            sec/op            │
ReadFrame/1KiB-4                          1.027µ ± 2%
ReadFrame/1MiB-4                          927.3µ ± 1%
ReadFrame/8MiB-4                          6.957m ± 1%
ReadFrame/16MiB-4                         14.01m ± 1%
ReadMessage/1MiB/1-4                      1.071m ± 1%
ReadMessage/8MiB/1-4                      7.711m ± 2%
ReadMessage/16MiB/1-4                     15.34m ± 2%
ReadMessage/1MiB/4-4                      1.374m ± 2%
ReadMessage/8MiB/4-4                      10.78m ± 5%
ReadMessage/16MiB/4-4                     18.72m ± 4%
ReadMessage/1MiB/16-4                     1.917m ± 2%
ReadMessage/8MiB/16-4                     17.34m ± 6%
ReadMessage/16MiB/16-4                    23.97m ± 6%
geomean                                   3.115m

                       │ ./baseline/bench-results.txt │
                       │             B/op             │
ReadFrame/1KiB-4                         1.055Ki ± 0%
ReadFrame/1MiB-4                         1.000Mi ± 0%
ReadFrame/8MiB-4                         8.000Mi ± 0%
ReadFrame/16MiB-4                        16.00Mi ± 0%
ReadMessage/1MiB/1-4                     1.000Mi ± 0%
ReadMessage/8MiB/1-4                     8.000Mi ± 0%
ReadMessage/16MiB/1-4                    16.00Mi ± 0%
ReadMessage/1MiB/4-4                     3.602Mi ± 0%
ReadMessage/8MiB/4-4                     28.57Mi ± 0%
ReadMessage/16MiB/4-4                    57.09Mi ± 0%
ReadMessage/1MiB/16-4                    5.407Mi ± 0%
ReadMessage/8MiB/16-4                    47.22Mi ± 0%
ReadMessage/16MiB/16-4                   93.81Mi ± 0%
geomean                                  5.263Mi

                       │ ./baseline/bench-results.txt │
                       │          allocs/op           │
ReadFrame/1KiB-4                           5.000 ± 0%
ReadFrame/1MiB-4                           5.000 ± 0%
ReadFrame/8MiB-4                           5.000 ± 0%
ReadFrame/16MiB-4                          5.000 ± 0%
ReadMessage/1MiB/1-4                       6.000 ± 0%
ReadMessage/8MiB/1-4                       6.000 ± 0%
ReadMessage/16MiB/1-4                      6.000 ± 0%
ReadMessage/1MiB/4-4                       24.00 ± 0%
ReadMessage/8MiB/4-4                       24.00 ± 0%
ReadMessage/16MiB/4-4                      24.00 ± 0%
ReadMessage/1MiB/16-4                      90.00 ± 0%
ReadMessage/8MiB/16-4                      90.00 ± 0%
ReadMessage/16MiB/16-4                     90.00 ± 0%
geomean                                    14.59

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant