Skip to content

Simultaneous read of same file is terribly slow #814

@tamastarjanyi

Description

@tamastarjanyi

Issue

Tested with
goofys version 0.24.0-45b8d78375af1b24604439d2e60c567654bcdf88

If you mount an S3 bucket (tested with CEPH or Minio but not with AWS) and read the same "large" file from two process the performance is very bad while reading from only one process or reading different files parallel the performance is much better. This is not a network issue cause the speed drops to the 1/10-1/20 or even less. I would expect some locking issue, but this is just a guess.

How to reproduce.

Mount a bucket.
In the mounted dir create a big file:

dd if=/dev/random of=bigfile1g bs=1024000 count=1024 status=progress
 941056000 bytes (941 MB, 897 MiB) copied, 5 s, 188 MB/s
1024+0 records in
1024+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 6.81221 s, 154 MB/s

Then execute read first from a single process.

dd if=bigfile1g of=/dev/null bs=1024000 status=progress
691200000 bytes (691 MB, 659 MiB) copied, 2 s, 345 MB/s
1024+0 records in
1024+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 2.68968 s, 390 MB/s

Then repeat the same by starting two read parallel. Empty fs cache if you have it or delete and recreate the bigfile1g.
And this is my result for both

dd if=bigfile1g of=/dev/null bs=1024000 status=progress
1039360000 bytes (1.0 GB, 991 MiB) copied, 137 s, 7.6 MB/s
1024+0 records in
1024+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 137.767 s, 7.6 MB/s

dd if=bigfile1g of=/dev/null bs=1024000 status=progress
1038336000 bytes (1.0 GB, 990 MiB) copied, 137 s, 7.6 MB/s
1024+0 records in
1024+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 137.868 s, 7.6 MB/s

If you create two random bigfiles and read them parallel they are just fine. (I have very good bandwidth on this test server but also tested via vpn and in this case running two dd on different files halves the bandwidth.)

dd if=/dev/random of=bigfile1g bs=1024000 count=1024 status=progress
1003520000 bytes (1.0 GB, 957 MiB) copied, 3 s, 334 MB/s
1024+0 records in
1024+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 3.0863 s, 340 MB/s

dd if=/dev/random of=bigfile1g2 bs=1024000 count=1024 status=progress
1035264000 bytes (1.0 GB, 987 MiB) copied, 3 s, 345 MB/s
1024+0 records in
1024+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 3.02763 s, 346 MB/s

To make sure this is not a server issue I also made a parallel download with s3cmd and that also just halves the speed because of network but no serious issue with that.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions