Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Support for CDB64? #11

Open
lemondevxyz opened this issue Aug 30, 2024 · 2 comments
Open

Feature request: Support for CDB64? #11

lemondevxyz opened this issue Aug 30, 2024 · 2 comments

Comments

@lemondevxyz
Copy link

CDB has a file size limit of 4 GB which is a part of the file specification. There is a library that makes some of the 32 bit integers, the ones responsible for the 4 GB limit, into 64 bit integers thereby erasing the file limit.

Here's a link to the cdb64 library.

The use case for this feature request is multiple hosts. Imagine you have multiple hosts that use high quality images (4 MB per image) and the 4 GB is quickly reached when applied to multiple hosts.

I can implement this feature request myself.

@cipriancraciun
Copy link
Member

I know about the various 64 bit CDB alternatives (including the one you've mentioned), however (at least at the time I've done my initial assesment of them) none of them supported mmap-ed databases. In fact neither does the "plain" CDB one (at least the Go port), which I had to patch to make it support mmap (the fork is here https://github.com/cipriancraciun/go-cdb-lib).

In fact, see this issue raised on the project that you've mentioned, which actually points to my initial issue on the "plain" CDB Go library:


Why is mmap important?

Kawipiko's main focus is raw performance. Go has a garbage-collected run-time. These two don't play nice with one another.

Using read syscalls requires (new) buffers, which then are handed to the HTTP library as the response, which are then "dropped", and thus transformed into "garbage" that the Go runtime has to "collect". This heavily impacts performance. (Buffer pools are a solution, but both the CDB library and the HTTP library should support them, which unfortunately none of them do.)

Another minor observation is that using mmap eliminates the need of syscalls, thus improving the performance a bit.

As a side-note, Kawipiko doesn't use Go's HTTP implementation (for HTTP/1.1) but instead it uses https://github.com/valyala/fasthttp which is carefully implemented to eliminate as much as possible the need for the Go runtime to execute garbage collection. (I do use Go's HTTP implementation for HTTP/2, and that is why the performance for HTTP/1.1 is far better than HTTP/2.)

Getting back to mmap, because the database is mmap-ed, there is no extra buffer allocation, no extra garbage-collection, and thus the performance is improved.

(In fact, I spent perhaps more time on profiling Kawipiko to make sure the Go garbage collector doesn't kick-in than on actually writing Kawipiko.)


Thus to summarize, using a 64 bit variant of CDB is doable, but only if one forks and patches one of the existing libraries to support mmap.

(I'll open a discussion thread about a different approach.)

@cipriancraciun
Copy link
Member

@lemondevxyz also see discussion #14 which might be an alternative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants