Skip to content

Conversation

@TonyxSun
Copy link
Member

@TonyxSun TonyxSun commented Jan 10, 2025

This aims to solve the problem of 'frogomobile' by preventing simultaneous R/W into the same file (in the same directory, as was happening with the index files).

We can also actually send files now (before, we either rely on the optimization that the files already exist, so we don't need to initiate fetching, or have FTP commented out for actual training files. The only files we have sent before where the index files). This is pretty important if we are going to test multiple machines.

Fixed some misc. issues that were needed to get 10,000 + file transfers to happen, like dangling file pointers/sockets, excessive creation of new sockets, and added socket timeout.

So I defined the following macros:

/*
 * Defines the data location of training files and index files for the requestor.
 */
const std::string SOURCE_DATA_DIR = "CIFAR10/train";

/*
 * Defines the data location of training files being used locally by a provider.
 */
const std::string TARGET_DATA_DIR = "data/CIFAR10";

The data should be in SOURCE_DATA_DIR for the requester. The training files will be downloaded to TARGET_DATA_DIR on the provider machines.

Demo
https://www.youtube.com/watch?v=gSePWZN4YAs

TODO in a meeting: I arbitrarily set the names of the folders. We can confirm whether they make sense and also the training scripts might need to changed to read from new folders. One problem with this is that we can't send the test files for now. We need a way to specify the set of files we want and where to download them.

@TonyxSun TonyxSun marked this pull request as draft January 10, 2025 19:29
@TonyxSun TonyxSun force-pushed the ftp_io_directories branch 2 times, most recently from a88d454 to b8642a7 Compare February 3, 2025 04:15
@TonyxSun TonyxSun changed the base branch from integration to test February 25, 2025 22:03
@TonyxSun TonyxSun changed the base branch from test to integration February 25, 2025 22:32
@TonyxSun TonyxSun marked this pull request as ready for review February 25, 2025 22:47
@jordanmao
Copy link
Member

I think we should add ml/test_first.txt, ml/test_second.txt, and ml/test_third.txt to .gitignore

jordanmao and others added 21 commits March 6, 2025 08:10
[ML Integration] Integration Branch Combining C++ and Python Work
Modifications to allow P2P over Tailscale
* temp changes

* changes made

Co-authored-by: Wuyue (Tony) Sun <[email protected]>
Co-authored-by: Rayaq Siddiqui <[email protected]>

* fix build errors

* completed multiple iterations code

Co-authored-by: Jordan Mao <[email protected]>
Co-authored-by: Joon Kang <[email protected]>
Co-authored-by: Wuyue (Tony) Sun <[email protected]>
Co-authored-by: Rayaq Siddiqui <[email protected]>

* small bug fixes

---------

Co-authored-by: jordanmao <[email protected]>
Co-authored-by: Wuyue (Tony) Sun <[email protected]>
Co-authored-by: Rayaq Siddiqui <[email protected]>
Co-authored-by: Joon Kang <[email protected]>
Implement better command line argument parsing for main.cpp using Boost
@TonyxSun TonyxSun force-pushed the ftp_io_directories branch from ed3eb3a to 3c82e9d Compare March 24, 2025 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants