Skip to content

Upload takes too long with duplicate_files = false. Maybe add hashes cache? #367

@vvrein

Description

@vvrein

Hi! Thank you for the great tool ❤️
I started to use it recently, by migrating from other pastebin, and as far as I can see, rustypaste with duplicate_files = false tries to hash each file in upload folder to guess if file which to be uploaded already exists.

Here is info about my existing uploads:

# ls -lha data/files/ | wc -l
727
# du -hs data/files/
3.9G    .data/files/

Uploaded file:

$ ls -lha favicon-152.png
-rw-r--r-- 1 vrein vrein 7.7K Dec 20  2020 favicon-152.png

Uploading time with duplicate_files = true:

$ time curl http://127.0.0.1:8880/ -F "file=@favicon-152.png"
http://127.0.0.1:8880/favicon-152.AqlQ6JKAp9.png

real    0m0.010s
user    0m0.006s
sys     0m0.003s

Uploading time with duplicate_files = false:

$ time curl http://127.0.0.1:8880/ -F "file=@favicon-152.png"
http://127.0.0.1:8880/favicon-152.AqlQ6JKAp9.png

real    0m10.411s
user    0m0.007s
sys     0m0.003s

I've added some random large files with dd if=/dev/urandom of=largefile bs=1M count=... and summarized in table:

total files count total files size uploading time
727 3.9G 10.411s
728 (+1 1G) 4.9G 13.137s
729 (+1x2G +1x1G) 6.9G 19.254s
3730 (+3k 1M) 6.9G 18.403s

Upload time mostly depends on total files size, files count - unless reached a few millions - should not impact drastically.

I think this is a really great feature, but with current implementation it is prone to enlarge uploading time as file size and count increase, so maybe adding simple cache mechanism, like storing file hashes in memory or in file is worth implementing.

rustypaste version: built from source bf6dd31
os: arch linux
kernel: 6.10.10-arch1-1
processor: Intel(R) Xeon(R) CPU E3-1230 v6 @ 3.50GHz

Unfortunately I have no experience with rust, so may help only with testing and debugging :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions