|
| 1 | +# Remote backends |
| 2 | + |
| 3 | +There are a couple of things you need to know when using backends that launch workers |
| 4 | +remotely, meaning not on your machine. |
| 5 | + |
| 6 | +## Cross-platform support |
| 7 | + |
| 8 | +Issue: {issue}`102`. |
| 9 | + |
| 10 | +Currently, it is not possible to run tasks in a remote environment that has a different |
| 11 | +OS than your local system. The reason is that when pytask sends data to the remote |
| 12 | +worker, the data contains path objects, {class}`pathlib.WindowsPath` or |
| 13 | +{class}`pathlib.PosixPath`, which cannot be unpickled on a different system. |
| 14 | + |
| 15 | +In general, remote machines are Unix machines which means people running Unix systems |
| 16 | +themselves like Linux and MacOS should have no problems. |
| 17 | + |
| 18 | +Windows users on the other hand should use the |
| 19 | +[WSL (Windows Subsystem for Linux)](https://learn.microsoft.com/en-us/windows/wsl/about) |
| 20 | +to run their projects. |
| 21 | + |
| 22 | +## Local files |
| 23 | + |
| 24 | +Avoid using local files with remote backends and use storages like S3 for dependencies |
| 25 | +and products. The reason is that every local file needs to be send to the remote workers |
| 26 | +and when your internet connection is slow you will face a hefty penalty on runtime. |
| 27 | + |
| 28 | +## Local paths |
| 29 | + |
| 30 | +In most projects you are using local paths to refer to dependencies and products of your |
| 31 | +tasks. This becomes an interesting problem with remote workers since your local files |
| 32 | +are not necessarily available in the remote machine. |
| 33 | + |
| 34 | +pytask-parallel does its best to sync files before the execution to the worker, so you |
| 35 | +can run your tasks locally and remotely without changing a thing. |
| 36 | + |
| 37 | +In case you create a file on the remote machine, the product will be synced back to your |
| 38 | +local machine as well. |
| 39 | + |
| 40 | +It is still necessary to know that the remote paths are temporary files that share the |
| 41 | +same file extension as the local file, but the name and path will be different. Do not |
| 42 | +rely on them. |
| 43 | + |
| 44 | +Another way to circumvent the problem is to first define a local task that stores all |
| 45 | +your necessary files in a remote storage like S3. In the remaining tasks, you can then |
| 46 | +use paths pointing to the bucket instead of the local machine. See the |
| 47 | +[guide on remote files](https://tinyurl.com/pytask-remote) for more explanations. |
0 commit comments