Important
Last updated in December 2021!
Bottom line (even on my 6 years old laptop), using only 1 core with 2 hyperthreads (one core for the server, and a separate core for the load generator), with HTTP Keep-Alive capped at 256k requests per connection:
- under normal conditions (16 concurrent connections), I get around 105k requests / second, at about 0.4ms latency for 99% of the requests;
- under normal conditions (64 concurrent connections), I get around 107k requests / second, at about 1.5ms latency for 99% of the requests;
- under light stress conditions (128 concurrent connections), I get around 110k requests / second, at about 3.0ms latency for 99% of the requests;
- under medium stress conditions (512 concurrent connections), I get around 104k requests / second, at about 9.3ms latency for 99% of the requests (meanwhile the average is under 5.0ms);
- under high stress conditions (2048 concurrent connections), I get around 103k requests / second, at about 240ms latency for 99% of the requests (meanwhile the average is under 20ms);
- under extreme stress conditions (16384 concurrent connections) (i.e. someone tries to DDOS the server), I get around 90k requests / second, at about 3.1s latency for 99% of the requests (meanwhile the average is under 200ms);
- the performance is at least on-par with NGinx;
however, especially for a real world scenarios
(i.e. thousand of small files, accessed in a random patterns),
I believe
kawipiko
fares much better; (not to mention how simple it is to configure and deploykawipiko
as compared to NGinx, which took a lot of time, fiddling, and trial and error to get it right;)
Regarding HTTPS, my initial benchmarks
(only covering plain HTTPS with HTTP/1)
seem to indicate that kawipiko
is at least on-par with NGinx.
Regarding HTTP/2, my initial benchmarks
seem to indicate that kawipiko
's performance is 6 times less than plain HTTPS with HTTP/1
(mainly due to the unoptimized Go net/http
implementation).
In this regard NGinx is much better, having a HTTP/2 performance similar to plain HTTPS with HTTP/1.
Regarding HTTP/3, given that the QUIC library is still experimental,
my initial benchmarks seem to indicate that kawipiko
's performance is quite poor
(at about 5k requests / second).
Important
Last updated in August 2018!
The results are based on an older version of kawipiko
;
the current version is at least 10% more efficient.
The methodology used is described in a dedicated section.
Note
Please note that the values under Thread Stats are reported per thread. Therefore it is best to look at the first two values, i.e. Requests/sec.
kawipiko
, 16 connections / 2 server threads / 2wrk
threads:Requests/sec: 111720.73 Transfer/sec: 18.01MB Running 30s test @ http://127.0.0.1:8080/ 2 threads and 16 connections Thread Stats Avg Stdev Max +/- Stdev Latency 139.36us 60.27us 1.88ms 64.91% Req/Sec 56.14k 713.04 57.60k 91.36% Latency Distribution 50% 143.00us 75% 184.00us 90% 212.00us 99% 261.00us 3362742 requests in 30.10s, 541.98MB read
kawipiko
, 128 connections / 2 server threads / 2wrk
threads:Requests/sec: 118811.41 Transfer/sec: 19.15MB Running 30s test @ http://127.0.0.1:8080/ 2 threads and 128 connections Thread Stats Avg Stdev Max +/- Stdev Latency 1.03ms 705.69us 19.53ms 63.54% Req/Sec 59.71k 1.69k 61.70k 96.67% Latency Distribution 50% 0.99ms 75% 1.58ms 90% 1.89ms 99% 2.42ms 3564527 requests in 30.00s, 574.50MB read
kawipiko
, 512 connections / 2 server threads / 2wrk
threads:Requests/sec: 106698.89 Transfer/sec: 17.20MB Running 30s test @ http://127.0.0.1:8080/ 2 threads and 512 connections Thread Stats Avg Stdev Max +/- Stdev Latency 4.73ms 3.89ms 39.32ms 39.74% Req/Sec 53.71k 1.73k 69.18k 84.33% Latency Distribution 50% 4.96ms 75% 8.63ms 90% 9.19ms 99% 10.30ms 3206540 requests in 30.05s, 516.80MB read Socket errors: connect 0, read 105, write 0, timeout 0
kawipiko
, 2048 connections / 2 server threads / 2wrk
threads:Requests/sec: 100296.65 Transfer/sec: 16.16MB Running 30s test @ http://127.0.0.1:8080/ 2 threads and 2048 connections Thread Stats Avg Stdev Max +/- Stdev Latency 45.42ms 85.14ms 987.70ms 88.62% Req/Sec 50.61k 5.59k 70.14k 71.74% Latency Distribution 50% 16.30ms 75% 28.44ms 90% 147.60ms 99% 417.40ms 3015868 requests in 30.07s, 486.07MB read Socket errors: connect 0, read 128, write 0, timeout 86
kawipiko
, 4096 connections / 2 server threads / 2wrk
threads:Requests/sec: 95628.34 Transfer/sec: 15.41MB Running 30s test @ http://127.0.0.1:8080/ 2 threads and 4096 connections Thread Stats Avg Stdev Max +/- Stdev Latency 90.50ms 146.08ms 999.65ms 88.49% Req/Sec 48.27k 6.09k 66.05k 76.34% Latency Distribution 50% 23.31ms 75% 112.06ms 90% 249.41ms 99% 745.94ms 2871404 requests in 30.03s, 462.79MB read Socket errors: connect 0, read 27, write 0, timeout 4449
kawipiko
, 16384 connections / 2 server threads / 2wrk
threads:Requests/sec: 53548.52 Transfer/sec: 8.63MB Running 30s test @ http://127.0.0.1:8080/ 2 threads and 16384 connections Thread Stats Avg Stdev Max +/- Stdev Latency 206.21ms 513.75ms 6.00s 92.56% Req/Sec 31.37k 5.68k 44.44k 76.13% Latency Distribution 50% 35.38ms 75% 62.78ms 90% 551.33ms 99% 2.82s 1611294 requests in 30.09s, 259.69MB read Socket errors: connect 0, read 115, write 0, timeout 2288
- the machine was my personal laptop, with an Intel Core i7 3667U (2 physical cores times 2 hyper-threads each), and with 8 GiB of RAM;
- the
kawipiko-server
was started with--processes 1 --threads 2
; (i.e. 2 threads handling the requests;) - the
kawipiko-server
was started with--archive-inmem
; (i.e. the CDB database file was preloaded into memory, thus no disk IO;) - the
kawipiko-server
was started with--security-headers-disable
; (because these headers are not set by default by other HTTP servers;) - the
kawipiko-server
was started with--timeout-disable
; (because, due to a known Go issue, usingnet.Conn.SetDeadline
has an impact of about 20% of the raw performance; thus the reported values above might be about 10%-15% smaller when used with timeouts;) - the benchmarking tool was
wrk
; - both
kawipiko-server
andwrk
tools were run on the same machine; - both
kawipiko-server
andwrk
tools were pinned on different physical cores; - the benchmark was run over loopback networking (i.e.
127.0.0.1
); - the served file contains
Hello World!
; - the protocol was HTTP (i.e. no TLS), with keep-alive;
- both the CDB and the NGinx folder were put on
tmpfs
(which implies that the disk is not a limiting factor); (in factkawipiko
performs quite well even on spinning disks due to careful storage management;)
Important
Last updated in August 2019!
The results are based on an older version of kawipiko
;
the current version is at least 10% more efficient.
The methodology used is described in a dedicated section.
NGinx, 512 connections / 2 worker processes / 2
wrk
threads:Requests/sec: 79816.08 Transfer/sec: 20.02MB Running 30s test @ http://127.0.0.1:8080/index.txt 2 threads and 512 connections Thread Stats Avg Stdev Max +/- Stdev Latency 6.07ms 1.90ms 19.83ms 71.67% Req/Sec 40.17k 1.16k 43.35k 69.83% Latency Distribution 50% 6.13ms 75% 6.99ms 90% 8.51ms 99% 11.10ms 2399069 requests in 30.06s, 601.73MB read
NGinx, 2048 connections / 2 worker processes / 2
wrk
threads:Requests/sec: 78211.46 Transfer/sec: 19.62MB Running 30s test @ http://127.0.0.1:8080/index.txt 2 threads and 2048 connections Thread Stats Avg Stdev Max +/- Stdev Latency 27.11ms 20.27ms 490.12ms 97.76% Req/Sec 39.45k 2.45k 49.98k 70.74% Latency Distribution 50% 24.80ms 75% 29.67ms 90% 34.99ms 99% 126.97ms 2351933 requests in 30.07s, 589.90MB read Socket errors: connect 0, read 0, write 0, timeout 11
NGinx, 4096 connections / 2 worker processes / 2
wrk
threads:Requests/sec: 75970.82 Transfer/sec: 19.05MB Running 30s test @ http://127.0.0.1:8080/index.txt 2 threads and 4096 connections Thread Stats Avg Stdev Max +/- Stdev Latency 70.25ms 73.68ms 943.82ms 87.21% Req/Sec 38.37k 3.79k 49.06k 70.30% Latency Distribution 50% 46.37ms 75% 58.28ms 90% 179.08ms 99% 339.05ms 2282223 requests in 30.04s, 572.42MB read Socket errors: connect 0, read 0, write 0, timeout 187
NGinx, 16384 connections / 2 worker processes / 2
wrk
threads:Requests/sec: 43909.67 Transfer/sec: 11.01MB Running 30s test @ http://127.0.0.1:8080/index.txt 2 threads and 16384 connections Thread Stats Avg Stdev Max +/- Stdev Latency 223.87ms 551.14ms 5.94s 92.92% Req/Sec 32.95k 13.35k 51.56k 76.71% Latency Distribution 50% 32.62ms 75% 222.93ms 90% 558.04ms 99% 3.17s 1320562 requests in 30.07s, 331.22MB read Socket errors: connect 0, read 12596, write 34, timeout 1121
the NGinx configuration file can be found in the examples folder; the configuration was obtained after many experiments to squeeze out of NGinx as much performance as possible, given the targeted use-case, namely many small files;
moreover NGinx seems to be quite sensitive to the actual path requested:
- if one requests
http://127.0.0.1:8080/
, and one has configured NGinx to look forindex.txt
, and that file actually exists, the performance is quite a bit lower than just asking for that file; (perhaps it issues more syscalls searching for the index file;) - if one requests
http://127.0.0.1:8080/index.txt
, as mentioned above, it achieves the higher performance; (perhaps it issues fewer syscalls;) - if one requests
http://127.0.0.1:8080/does-not-exist
, it seems to achieve the best performance; (perhaps it issues the least amount of syscalls;) (however this is not an actual useful corner-case;) - it must be noted that
kawipiko
doesn't exhibit this behaviour, the same performance is achieved regardless of the path variant; - therefore the benchmarks above use
/index.txt
as opposed to/
, in order not to disfavour NGinx;
- if one requests
darkhttpd
, 512 connections / 1 server process / 2wrk
threads:Requests/sec: 38191.65 Transfer/sec: 8.74MB Running 30s test @ http://127.0.0.1:8080/index.txt 2 threads and 512 connections Thread Stats Avg Stdev Max +/- Stdev Latency 17.51ms 17.30ms 223.22ms 78.55% Req/Sec 9.62k 1.94k 17.01k 72.98% Latency Distribution 50% 7.51ms 75% 32.51ms 90% 45.69ms 99% 53.00ms 1148067 requests in 30.06s, 262.85MB read
Important
Last updated in August 2019!
The results are based on an older version of kawipiko
;
the current version is at least 10% more efficient.
The methodology used is described in a dedicated section.
As a benchmark much closer to the "real world" use-cases for kawipiko
I've done the following:
- downloaded from OpenStreetMap servers all tiles for my home town
(from zoom level 0 to zoom level 19), which resulted in:
- around ~250k PNG files totaling ~330 MiB;
- with an average of 1.3 KiB and a median of 103B; (i.e. lots of extreemly small files;)
- occupying actualy around 1.1 GiB of storage (on Ext4) due to file-system overheads;
- created a CDB archive, which resulted in:
- a single file totaling ~376 MiB (both "apparent" and "occupied" storage); (i.e. no storage space wasted;)
- which contains only ~100k PNG files, due to elimination of duplicate PNG files; (i.e. at higher zoom levels, the tiles start to repeat;)
- listed all the available tiles, and benchmarked both
kawipiko
and NGinx, with 16k concurrent connections; - the methodology is the same one described above,
with the following changes:
- the machine used was my desktop, with an Intel Core i7 4770 (4 physical cores times 2 hyper-threads each), and with 32 GiB of RAM;
- the files (both CDB and tiles folder) were put in
tmpfs
; - both
kawipiko
, NGinx andwrk
were configured to use 8 threads, and were pinned on two separate physical cores each; - (the machine had almost nothing running on it except the minimal required services;)
Based on my benchmark the following are my findings:
kawipiko
outperformed NGinx by ~25% in requests / second;kawipiko
outperformed NGinx by ~29% in average response latency;kawipiko
outperformed NGinx by ~40% in 90-percentile response latency;kawipiko
used ~6% less CPU while serving requests for 2 minutes;kawipiko
used ~25% less CPU per request;- NGinx used the least amount of RAM,
meanwhile
kawipiko
used around 1 GiB of RAM (due to either in RAM loading ormmap
usage);
kawipiko
with--archive-inmem
and--index-all
(1 process, 8 threads):Requests/sec: 238499.86 Transfer/sec: 383.59MB Running 2m test @ http://127.9.185.194:8080/ 8 threads and 16384 connections Thread Stats Avg Stdev Max +/- Stdev Latency 195.39ms 412.84ms 5.99s 92.33% Req/Sec 30.65k 10.20k 213.08k 79.41% Latency Distribution 50% 28.02ms 75% 221.17ms 90% 472.41ms 99% 2.19s 28640139 requests in 2.00m, 44.98GB read Socket errors: connect 0, read 0, write 0, timeout 7032
kawipiko
with--archive-mmap
(1 process, 8 threads):Requests/sec: 237239.35 Transfer/sec: 381.72MB Running 2m test @ http://127.9.185.194:8080/ 8 threads and 16384 connections Thread Stats Avg Stdev Max +/- Stdev Latency 210.44ms 467.84ms 6.00s 92.57% Req/Sec 30.77k 12.29k 210.17k 86.67% Latency Distribution 50% 26.51ms 75% 221.63ms 90% 494.93ms 99% 2.67s 28489533 requests in 2.00m, 44.77GB read Socket errors: connect 0, read 0, write 0, timeout 10730
kawipiko
with--archive-mmap
(8 processes, 1 thread):Requests/sec: 248266.83 Transfer/sec: 399.29MB Running 2m test @ http://127.9.185.194:8080/ 8 threads and 16384 connections Thread Stats Avg Stdev Max +/- Stdev Latency 209.30ms 469.05ms 5.98s 92.25% Req/Sec 31.86k 8.58k 83.99k 69.93% Latency Distribution 50% 23.08ms 75% 215.28ms 90% 502.80ms 99% 2.64s 29816650 requests in 2.00m, 46.83GB read Socket errors: connect 0, read 0, write 0, timeout 15244
NGinx (8 worker processes):
Requests/sec: 188255.32 Transfer/sec: 302.88MB Running 2m test @ http://127.9.185.194:8080/ 8 threads and 16384 connections Thread Stats Avg Stdev Max +/- Stdev Latency 266.18ms 538.72ms 5.93s 90.78% Req/Sec 24.15k 8.34k 106.48k 74.56% Latency Distribution 50% 34.34ms 75% 253.57ms 90% 750.29ms 99% 2.97s 22607727 requests in 2.00m, 35.52GB read Socket errors: connect 0, read 109, write 0, timeout 16833
- get the
kawipiko
executables (either download or build them); - get the
hello-world.cdb
(from the examples folder inside the repository); - install NGinx and
wrk
from the distribution packages;
this scenario will yield a base-line performance per core;
execute the server (in-memory and indexed) (i.e. the best case scenario):
kawipiko-server \ --bind 127.0.0.1:8080 \ --archive ./hello-world.cdb \ --archive-inmem \ --index-all \ --processes 1 \ --threads 1 \ #
execute the server (memory mapped) (i.e. the the recommended scenario):
kawipiko-server \ --bind 127.0.0.1:8080 \ --archive ./hello-world.cdb \ --archive-mmap \ --processes 1 \ --threads 1 \ #
this scenario is the usual setup; configure
--threads
to equal the number of logical cores (i.e. multiply the number of physical cores with the number of hyper-threads per physical core);execute the server (memory mapped):
kawipiko-server \ --bind 127.0.0.1:8080 \ --archive ./hello-world.cdb \ --archive-mmap \ --processes 1 \ --threads 2 \ #
wrk
, 512 concurrent connections, handled by 2 threads:wrk \ --threads 2 \ --connections 512 \ --timeout 1s \ --duration 30s \ --latency \ http://127.0.0.1:8080/index.txt \ #
wrk
, 4096 concurrent connections, handled by 2 threads:wrk \ --threads 2 \ --connections 4096 \ --timeout 1s \ --duration 30s \ --latency \ http://127.0.0.1:8080/index.txt \ #
the number of threads for the server plus for
wrk
shouldn't be larger than the number of available cores; (or use different machines for the server and the client;)also take into account that by default the number of file descriptors on most UNIX / Linux systems is 1024, therefore if you want to try with more connections than 1000, you need to raise this limit; (see bellow;)
additionally, you can try to pin the server and
wrk
to specific cores, increase various priorities (scheduling, IO, etc.); (given that Intel processors have hyper-threading which appear to the OS as individual cores, you should make sure that you pin each process on cores part of the same physical processor / core;)pinning the server (cores
0
and1
are mapped on the physical core1
):sudo -u root -n -E -P -- \ \ taskset -c 0,1 \ nice -n -19 -- \ ionice -c 2 -n 0 -- \ chrt -r 10 \ prlimit -n262144 -- \ \ sudo -u "${USER}" -n -E -P -- \ \ kawipiko-server \ ... \ #
pinning the client (cores
2
and3
are mapped on the physical core2
):sudo -u root -n -E -P -- \ \ taskset -c 2,3 \ nice -n -19 -- \ ionice -c 2 -n 0 -- \ chrt -r 10 \ prlimit -n262144 -- \ \ sudo -u "${USER}" -n -E -P -- \ \ wrk \ ... \ #