-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add compact encoding (v2) #73
base: main
Are you sure you want to change the base?
Conversation
Hmm... looks like it breaks compatibility with go <=1.8 because of math/bits used in blake2s. |
We can probably drop Go 1.8 and below at this point, and make a minor version bump as part of that change. Handling both encodings with a magic byte/prefix is otherwise OK by me. |
(@ me on this thread when you are ready for a full review - my schedule is packed right now but I will make sure to respond on this) |
@elithrar I think it is ready. I dropped go <= 1.8 from circleci and add short section to README. |
Use exclusively Blake2s as a MAC and ChaCha20 as a stream cipher. Blake2s is always faster than SHA2 and could be safely used as a MAC without HMAC construction (therefore, it is much faster than HMAC-SHA256). ChaCha20 is a bit slower than AES-CTR, but its usage causes less allocations. And it is faster without AES hardware optimization (appengine for example).
BenchmarkLegacyEncode-8 239144 4504 ns/op 3445 B/op 21 allocs/op BenchmarkLegacyEncode-8 261038 4499 ns/op 3442 B/op 21 allocs/op BenchmarkCompactEncode-8 933446 1304 ns/op 721 B/op 4 allocs/op BenchmarkCompactEncode-8 927328 1331 ns/op 721 B/op 4 allocs/op BenchmarkLegacyDecode-8 298273 3923 ns/op 2312 B/op 17 allocs/op BenchmarkLegacyDecode-8 298401 3944 ns/op 2367 B/op 17 allocs/op BenchmarkCompactDecode-8 806112 1359 ns/op 465 B/op 3 allocs/op BenchmarkCompactDecode-8 905617 1348 ns/op 471 B/op 3 allocs/op
ae38481
to
f83c210
Compare
Looks like I've reimplemented Synthetic Initialization Vector mode of operation. |
Having taken a look at this, it might make more sense to: • Version the encoding, so we know which encoding scheme is in use. |
It is already done with the first byte of encoded message. Or do you mean method on SecureCookie?
Blake2s is not less secure than AES_CMAC. ChaCha20 is not less secure than AES-CTR. Therefore I see no reason. Single "non-standard" think I did, is passing tail bytes of MAC to ChaCha20 (because ChaCha20 consume 96bit nonce, but MAC is 120bit). I could use 16byte MAC instead of 15byte to satisfy "128bit MAC and no single bit lesser". (btw, should time be encrypted or not? I prefer it to be encrypted, but possibly it could be useful for debugging in plaintext? (In fact, I could use truncated SHA256 for MAC. Truncated SHA256 doesn't suffer from "length-extention" attack, therefore, there is no real need in HMAC (until SHA256 is broken in some different way). But Blake2s is just faster and doesn't suffer from
What do you mean by Do you mean "using time as nonce for MAC"? Long unique nonce has meaning for those constructions that are sensible to Nonce reuse. Ie if same nonce used for different messages leads to information disclosure. For example, both AES-CTR and ChaCha20 leaks XOR of plaintext. AES-GCM leaks "authentication key" with nonce reuse, and I believe ChaCha20-Poly1305 too. SIV is "Nonce misuse resistant" ie use of same "nonce" for different messages doesn't leak any information (aside of "they are different").
I don't understand what do you mean. Any data passed to Encode will be encoded as "compact" if |
We could just support the old format on read, and when we write (save) a session, write it in the new format implicitly. Otherwise users need to make a choice that is hard to reason about - the performance difference even at 10000 QPS is hard to measure once you take the network into account. |
Beyond that, my overall comment is that in the drive for optimizing this, we now have more code to maintain, and a custom scheme. I would strongly prefer to use a) it is easier to maintain for anyone in the future when you are not around b) it is simpler c) it doesn't lose us any functionality. I mean this in the nicest possible way: that you had to go into depth to justify MAC length, nonce length and the XOR scheme tells me that this solution may be optimizing for problems that don't exist for most users. |
I see no need in both Nonce and Mac, since Mac is a good nonce already. If deal is in additional code, then I could implement this scheme in separate library and use that library here. If you against scheme "invented" by me, I could implement Daence: https://eprint.iacr.org/2020/067 . While it is rather new, it is suggested by MIT professor. But my scheme mostly differs only in MAC computation (and length), and Blake2 seems to be safer option. Ok, I can use raw XChaCha with 192bit nonce combining 16byte MAC with plaintext timestamp timestamp. And make use of keyed Blake2 instance with separate sync.Pool per SecureCookie instance. That will slow thing a bit (I did try), but there will be less code to think about. Will it be ok? |
54074fe
to
f2f5ab0
Compare
Ok, I've simplified things:
I've changed version to 1 to not interfer with my private version. |
simplify use of Blake2s and ChaCha: - use keyed Blake2s instance - use 192bit nonce with XChaCha20 (add 8 byte 'version+timestamp' header to mac) - get rid of cookie name length restriction (yes, allocate []byte(name))
f2f5ab0
to
b5523ec
Compare
Ok, I event got rid of Blake2s and use HMAC-SHA256 instead. |
Current encoding does Base64 twice against payload because of intermediate text encoding.
Also it adds 16byte nonce to value (and it is also expanded twice by Base64) and 32byte MAC with default settings.
Total overhead of added data is
(12+32+(16*4/3))*4/3 == 87
bytes, and value is expanded in(4/3)*(4/3)=1.77
times.Summary of Changes
This PR adds compact encoding for securecookie:
Total overhead data is
(1+15+8)*4/3 = 32
bytes. Value expanded only once in 1.33 times.Note: MAC is 15byte (120bit) which is certainly enough for this use case.
It saves a lot of cpu time and allocations: