Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add multistream http download #1710

Closed
wants to merge 6 commits into from

Conversation

ischasny
Copy link
Collaborator

@ischasny ischasny commented Sep 22, 2023

  • Add a feature to split http download into multiple chunks. Chunks are downloaded into temporary files, which then get merged into one;
  • Downnloads over libp2p are always done in one chunk only;
  • Add NChunks configuration parameter;
  • Fix tests to handle not just open-ended range requests;
  • Don't allow private ip addresses and redirects.

Fixes #1593

@ischasny ischasny force-pushed the ivan/feat-multistream-http-download branch 2 times, most recently from 3425fbc to cc0d047 Compare September 25, 2023 08:34
@ischasny ischasny marked this pull request as ready for review September 25, 2023 08:35
Copy link
Contributor

@dirkmc dirkmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great so far 👍

If I'm reading the code correctly, it will split the file into chunks (5 by default), and make a request for the chunks in parallel. However it will only start reading the response stream for each chunk once the previous chunk has completed. So I'm not sure this will end up being faster than a single stream download.

transport/httptransport/http_transport.go Outdated Show resolved Hide resolved
transport/httptransport/http_transport.go Show resolved Hide resolved
transport/httptransport/http_transport.go Outdated Show resolved Hide resolved
@ischasny
Copy link
Collaborator Author

Looks great so far 👍

If I'm reading the code correctly, it will split the file into chunks (5 by default), and make a request for the chunks in parallel. However it will only start reading the response stream for each chunk once the previous chunk has completed. So I'm not sure this will end up being faster than a single stream download.

Great point. I missed the bit that it downloads as it proceeds through the stream. Let me revise the logic and reopen the PR.

@ischasny ischasny closed this Sep 25, 2023
@ischasny ischasny reopened this Sep 26, 2023
@ischasny ischasny force-pushed the ivan/feat-multistream-http-download branch 2 times, most recently from b7c6e69 to 93cdde0 Compare September 26, 2023 16:29
* Add a feature to split http download into multiple chunks. Chunks are downloaded into temporary files, which then get merged into one;
* Downnloads over libp2p are always done in one chunk only;
* Add NChunks configuration parameter;
* Fix tests to handle not just open-ended range requests.
@ischasny ischasny force-pushed the ivan/feat-multistream-http-download branch from 93cdde0 to 46fe02c Compare September 26, 2023 16:40
Copy link
Contributor

@dirkmc dirkmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions, but overall it looks great, nice work! 🙌

transport/httptransport/http_downloader.go Outdated Show resolved Hide resolved
transport/httptransport/http_downloader.go Outdated Show resolved Hide resolved
transport/httptransport/http_downloader.go Outdated Show resolved Hide resolved
transport/httptransport/http_downloader.go Outdated Show resolved Hide resolved
transport/httptransport/http_transport.go Outdated Show resolved Hide resolved
transport/httptransport/http_transport.go Outdated Show resolved Hide resolved
transport/httptransport/http_downloader.go Outdated Show resolved Hide resolved
@@ -119,6 +121,16 @@ func (h *httpTransport) Execute(ctx context.Context, transportInfo []byte, dealI
}
tInfo.URL = u.Url

// don't allow private network addresses as per https://en.wikipedia.org/wiki/Private_network
Copy link
Member

@nonsense nonsense Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want this to be configurable. We don't really know what kind of setups SPs have and what software they've integrated Boost with... I wouldn't be surprised if some deals are made with private IPs for fetching data. (think of companies that handle both storing and onboarding of data, of which there are many)

@ischasny
Copy link
Collaborator Author

Splitting this PR into multiple small ones. The first one: #1723

@ischasny ischasny closed this Sep 28, 2023
@nonsense nonsense deleted the ivan/feat-multistream-http-download branch October 9, 2023 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Multi Stream http download for online deals
3 participants