-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are they any bindings for Golang? #10
Comments
Actually, after code review and experiments, I realised we would need to modify the implementation quite a bit: Firstly, we cannot assume that apply non-cryptographic hash function to the keys and effectively taking low 64-bit (this is what I discovered the implementation really does) as a fingerprint, would work. Because it would be easy to break such algorithm by creating collision in such fingeprints (as we are operating in adversarial environment). Secondly, we need to handle the case where all keys do not fit in RAM. But in principle the approach should work. |
So, I'm reading this just now. It should be easy to replace the hash function with a crypto-level one. For keys not fitting in RAM do you mean the hashes or the keys themselves? There's no need for the keys to be in RAM. One can even sort the hashes offline, but that's more work (stxxl?). |
@vigna thanks for replying. I have started re-implementing it in golang. We have what we call ETL (Extract Transform Load) framework in our library to be able to sort keys by bucket numbers "offline" (potentially using temporary files). My main issue was to make sure we do not generate duplicates when applying initial hash ( |
But wait, first_hash is made of 128 bits. The probability of collision is entirely negligible. I don't understand your point about 64-bit fingerprints. |
I meant that when the bucket of keys arrive into the |
But the bucket is maximum ≈2000 keys. The probability of collision of 2000 64-bit keys is still entirely negligible. If the first function is crypto, the keys are randomly chosen. |
I would like to use this library (specifically RecSplit for creating MPHFs) for some experiments in our Erigon project (if you are interested I can describe the possible use case): https://github.com/ledgerwatch/erigon
But I do need Golang binding to do so. I assume there are no bindings yet, and I was thinking about creating a bounty for someone to develop it.
The text was updated successfully, but these errors were encountered: