Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add compact encoding (v2) #73

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

funny-falcon
Copy link

@funny-falcon funny-falcon commented Jul 9, 2020

Current encoding does Base64 twice against payload because of intermediate text encoding.
Also it adds 16byte nonce to value (and it is also expanded twice by Base64) and 32byte MAC with default settings.
Total overhead of added data is (12+32+(16*4/3))*4/3 == 87 bytes, and value is expanded in (4/3)*(4/3)=1.77 times.

Summary of Changes

This PR adds compact encoding for securecookie:

  • value is passed through Base64 only once together with MAC and time
  • there is no separate nonce. MAC is calculated on plaintext + time (with submillisecond granularity) and then used as nonce for stream cipher.
  • migration is easy because it may decode both versions guessing by first base64 byte
  • algorithms are fixed to Blake2s and Chacha20 cause they are proven to be secure and fast on all common architectures (32/64 bit, appengine).

Total overhead data is (1+15+8)*4/3 = 32 bytes. Value expanded only once in 1.33 times.

Note: MAC is 15byte (120bit) which is certainly enough for this use case.

It saves a lot of cpu time and allocations:

BenchmarkLegacyEncode-8     239144   4504 ns/op  3445 B/op   21 allocs/op
BenchmarkLegacyEncode-8     261038   4499 ns/op  3442 B/op   21 allocs/op
BenchmarkCompactEncode-8    933446   1304 ns/op   721 B/op    4 allocs/op
BenchmarkCompactEncode-8    927328   1331 ns/op   721 B/op    4 allocs/op
BenchmarkLegacyDecode-8     298273   3923 ns/op  2312 B/op   17 allocs/op
BenchmarkLegacyDecode-8     298401   3944 ns/op  2367 B/op   17 allocs/op
BenchmarkCompactDecode-8    806112   1359 ns/op   465 B/op    3 allocs/op
BenchmarkCompactDecode-8    905617   1348 ns/op   471 B/op    3 allocs/op

@funny-falcon funny-falcon changed the title Compact encoding2 Add compact encoding (v2) Jul 9, 2020
@funny-falcon
Copy link
Author

Hmm... looks like it breaks compatibility with go <=1.8 because of math/bits used in blake2s.

@elithrar
Copy link
Contributor

elithrar commented Jul 9, 2020

We can probably drop Go 1.8 and below at this point, and make a minor version bump as part of that change.

Handling both encodings with a magic byte/prefix is otherwise OK by me.

@elithrar elithrar self-assigned this Jul 9, 2020
@elithrar elithrar self-requested a review July 9, 2020 12:43
@elithrar
Copy link
Contributor

elithrar commented Jul 9, 2020

(@ me on this thread when you are ready for a full review - my schedule is packed right now but I will make sure to respond on this)

@funny-falcon
Copy link
Author

funny-falcon commented Jul 9, 2020

@elithrar I think it is ready. I dropped go <= 1.8 from circleci and add short section to README.

Use exclusively Blake2s as a MAC and ChaCha20 as a stream cipher.
Blake2s is always faster than SHA2 and could be safely used as a MAC
without HMAC construction (therefore, it is much faster than HMAC-SHA256).
ChaCha20 is a bit slower than AES-CTR, but its usage causes less
allocations. And it is faster without AES hardware optimization
(appengine for example).
BenchmarkLegacyEncode-8     239144   4504 ns/op  3445 B/op   21 allocs/op
BenchmarkLegacyEncode-8     261038   4499 ns/op  3442 B/op   21 allocs/op
BenchmarkCompactEncode-8    933446   1304 ns/op   721 B/op    4 allocs/op
BenchmarkCompactEncode-8    927328   1331 ns/op   721 B/op    4 allocs/op
BenchmarkLegacyDecode-8     298273   3923 ns/op  2312 B/op   17 allocs/op
BenchmarkLegacyDecode-8     298401   3944 ns/op  2367 B/op   17 allocs/op
BenchmarkCompactDecode-8    806112   1359 ns/op   465 B/op    3 allocs/op
BenchmarkCompactDecode-8    905617   1348 ns/op   471 B/op    3 allocs/op
@funny-falcon
Copy link
Author

Looks like I've reimplemented Synthetic Initialization Vector mode of operation.
For example with AES: https://tools.ietf.org/html/rfc5297
Some CAESAR competitors also used SIV contruction, for example HS1SIV.

@elithrar
Copy link
Contributor

Having taken a look at this, it might make more sense to:

• Version the encoding, so we know which encoding scheme is in use.
• Use AES-GCM (-SIV) as you touch on) rather than some custom scheme.
• (or) use ChaCha/Poly / XSalsa / ideally /x/crypto/secretbox to encrypt & MAC the contents, due to longer (96-bit) nonces.
• Progressively re-encode any existing sessions when they are written back out again.

@funny-falcon
Copy link
Author

funny-falcon commented Jul 10, 2020

Version the encoding, so we know which encoding scheme is in use.

It is already done with the first byte of encoded message. Or do you mean method on SecureCookie?

Use AES-GCM (-SIV) as you touch on) rather than some custom scheme.

Blake2s is not less secure than AES_CMAC. ChaCha20 is not less secure than AES-CTR. Therefore I see no reason.
I use Blake2s and ChaCha20 because they looks to work optimally on any platform, has no known issues, and implemented in golang.org/x/crypto.

Single "non-standard" think I did, is passing tail bytes of MAC to ChaCha20 (because ChaCha20 consume 96bit nonce, but MAC is 120bit).
I could use of XChaCha20, or rehash BlockKey together with this bytes to derive new key (that is what XChaCha20 does in fact).
But sacrifying 24bit of 256bit key doesn't really mean much, since ChaCha20 itself doen't claims more than 128bit safety.
Thefore I strongly believe XOR-ing them to last key bytes is just enough.

I could use 16byte MAC instead of 15byte to satisfy "128bit MAC and no single bit lesser".
I just believe 120bit is enough (and I can describe why), and it makes whole header 24byte long before Base64, and they expand to raw 32bytes of Base64.
But I could encode time into 7 bytes, therefore togegher with 16byte MAC header length will remain same.

(btw, should time be encrypted or not? I prefer it to be encrypted, but possibly it could be useful for debugging in plaintext?
I just hate same 'AAAAF8' in the head of Base64).

(In fact, I could use truncated SHA256 for MAC. Truncated SHA256 doesn't suffer from "length-extention" attack, therefore, there is no real need in HMAC (until SHA256 is broken in some different way). But Blake2s is just faster and doesn't suffer from
length extension even without truncation.
And I can use AES-CTR instead of ChaCha20. But it's usage leads to more allocations, and it could be slow on platforms without
hardware acceleration, while ChaCha20 is always fast.)

(or) use ChaCha/Poly / XSalsa / ideally /x/crypto/secretbox to encrypt & MAC the contents, due to longer (96-bit) nonces.

What do you mean by longer 96bit nonce ? My scheme uses 120bit MAC as a nonce for encryption, which is much longer than 96bit nonce.

Do you mean "using time as nonce for MAC"?
This doesn't lead to problem with encryption, because encryption uses MAC as a nonce, and it is 120bit (could be increased to 128bit easily).

Long unique nonce has meaning for those constructions that are sensible to Nonce reuse. Ie if same nonce used for different messages leads to information disclosure. For example, both AES-CTR and ChaCha20 leaks XOR of plaintext. AES-GCM leaks "authentication key" with nonce reuse, and I believe ChaCha20-Poly1305 too.

SIV is "Nonce misuse resistant" ie use of same "nonce" for different messages doesn't leak any information (aside of "they are different").
That is because:
a) strong cryptographic hash function is used (instead of weak GHASH and Poly1305), and it guarantees no information leak about key or plaintext when same nonce is used (aside "message same or differend"),
b) cipher doesn't use that nonce, but uses MAC produced by strong function, which is certainly different for different messages.
Therefore, only information my scheme leaks during 1/16536 of second is "same or different message were encoded"

x/nacl/secretbox is just XChaCha20-Poly1305. That means it requires nonce and produces MAC.
But we doesn't really need both huge random nonce and huge MAC with SIV construction. Single big-enough MAC is enough.
Whole point of "compact" encoding to be compact, that is why I "occasinally reimplemented" SIV.

Progressively re-encode any existing sessions when they are written back out again.

I don't understand what do you mean. Any data passed to Encode will be encoded as "compact" if seccookie.Compact(true) were called.

@elithrar
Copy link
Contributor

I don't understand what do you mean. Any data passed to Encode will be encoded as "compact" if seccookie.Compact(true) were called.

We could just support the old format on read, and when we write (save) a session, write it in the new format implicitly. Otherwise users need to make a choice that is hard to reason about - the performance difference even at 10000 QPS is hard to measure once you take the network into account.

@elithrar
Copy link
Contributor

Beyond that, my overall comment is that in the drive for optimizing this, we now have more code to maintain, and a custom scheme.

I would strongly prefer to use /x/crypto/nacl/secretbox because:

a) it is easier to maintain for anyone in the future when you are not around

b) it is simpler

c) it doesn't lose us any functionality.

I mean this in the nicest possible way: that you had to go into depth to justify MAC length, nonce length and the XOR scheme tells me that this solution may be optimizing for problems that don't exist for most users.

@funny-falcon
Copy link
Author

I see no need in both Nonce and Mac, since Mac is a good nonce already.

If deal is in additional code, then I could implement this scheme in separate library and use that library here.

If you against scheme "invented" by me, I could implement Daence: https://eprint.iacr.org/2020/067 . While it is rather new, it is suggested by MIT professor. But my scheme mostly differs only in MAC computation (and length), and Blake2 seems to be safer option.

Ok, I can use raw XChaCha with 192bit nonce combining 16byte MAC with plaintext timestamp timestamp. And make use of keyed Blake2 instance with separate sync.Pool per SecureCookie instance. That will slow thing a bit (I did try), but there will be less code to think about. Will it be ok?

@funny-falcon
Copy link
Author

Ok, I've simplified things:

  • version+7byte plaintext (but binary) time compose header now. Time has submicrosecond precision.
  • encryption uses XChaCha with 24byte nonce: header + 16byte MAC . (XChaCha is choosen internally by nonce length)
  • MAC is produced by keyed Blake2s instance.

I've changed version to 1 to not interfer with my private version.

simplify use of Blake2s and ChaCha:
- use keyed Blake2s instance
- use 192bit nonce with XChaCha20 (add 8 byte 'version+timestamp' header to mac)
- get rid of cookie name length restriction (yes, allocate []byte(name))
@funny-falcon
Copy link
Author

Ok, I event got rid of Blake2s and use HMAC-SHA256 instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

2 participants