You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/executor.md
-27Lines changed: 0 additions & 27 deletions
Original file line number
Diff line number
Diff line change
@@ -144,33 +144,6 @@ Resource requests and other job characteristics can be controlled via the follow
144
144
145
145
See the {ref}`Google Cloud Batch <google-batch>` page for further configuration details.
146
146
147
-
(google-lifesciences-executor)=
148
-
149
-
## Google Life Sciences
150
-
151
-
:::{versionadded} 20.01.0
152
-
:::
153
-
154
-
[Google Cloud Life Sciences](https://cloud.google.com/life-sciences) is a managed computing service that allows the execution of containerized workloads in the Google Cloud Platform infrastructure.
155
-
156
-
Nextflow provides built-in support for the Life Sciences API, which allows the seamless deployment of a Nextflow pipeline in the cloud, offloading the process executions as pipelines.
157
-
158
-
The pipeline processes must specify the Docker image to use by defining the `container` directive, either in the pipeline script or the `nextflow.config` file. Additionally, the pipeline work directory must be located in a Google Storage bucket.
159
-
160
-
To enable this executor, set `process.executor = 'google-lifesciences'` in the `nextflow.config` file.
161
-
162
-
Resource requests and other job characteristics can be controlled via the following process directives:
163
-
164
-
- {ref}`process-accelerator`
165
-
- {ref}`process-cpus`
166
-
- {ref}`process-disk`
167
-
- {ref}`process-machineType`
168
-
- {ref}`process-memory`
169
-
- {ref}`process-resourcelabels`
170
-
- {ref}`process-time`
171
-
172
-
See the {ref}`Google Life Sciences <google-lifesciences>` page for further configuration details.
Copy file name to clipboardExpand all lines: docs/google.md
+3-153Lines changed: 3 additions & 153 deletions
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@
4
4
5
5
## Credentials
6
6
7
-
Credentials for submitting requests to the Google Cloud Batch and Cloud LifeSciences API are picked up from your environment using [Application Default Credentials](https://github.com/googleapis/google-auth-library-java#google-auth-library-oauth2-http). Application Default Credentials are designed to use the credentials most natural to the environment in which a tool runs.
7
+
Credentials for submitting requests to the Google Cloud Batch API are picked up from your environment using [Application Default Credentials](https://github.com/googleapis/google-auth-library-java#google-auth-library-oauth2-http). Application Default Credentials are designed to use the credentials most natural to the environment in which a tool runs.
8
8
9
9
The most common case will be to pick up your end-user Google credentials from your workstation. You can create these by running the command:
10
10
@@ -250,160 +250,23 @@ Currently, the following Nextflow directives are supported by the Google Batch e
250
250
- {ref}`process-memory`
251
251
- {ref}`process-time`
252
252
253
-
(google-lifesciences)=
254
-
255
-
## Cloud Life Sciences
256
-
257
-
:::{versionadded} 20.01.0-edge
258
-
:::
259
-
260
-
:::{note}
261
-
In versions of Nextflow prior to `21.04.0`, the following variables must be defined in your system environment:
262
-
263
-
```bash
264
-
export NXF_VER=20.01.0
265
-
export NXF_MODE=google
266
-
```
267
-
:::
268
-
269
-
[Cloud Life Sciences](https://cloud.google.com/life-sciences/) is a managed computing service that allows the execution of containerized workloads in the Google Cloud Platform infrastructure.
270
-
271
-
Nextflow provides built-in support for Cloud Life Sciences, allowing the seamless deployment of Nextflow pipelines in the cloud, in which tasks are offloaded to the Cloud Life Sciences service.
272
-
273
-
Read the {ref}`Google Life Sciences executor <google-lifesciences-executor>` page to learn about the `google-lifesciences` executor in Nextflow.
274
-
275
-
:::{warning}
276
-
This API works well for coarse-grained workloads (i.e. long-running jobs). It's not suggested the use this feature for pipelines spawning many short lived tasks.
277
-
:::
278
-
279
-
(google-lifesciences-config)=
280
-
281
-
### Configuration
282
-
283
-
Make sure to have defined in your environment the `GOOGLE_APPLICATION_CREDENTIALS` variable. See the section [Credentials](#credentials) for details.
284
-
285
-
:::{tip}
286
-
Make sure to enable the Cloud Life Sciences API beforehand. To learn how to enable it follow [this link](https://cloud.google.com/life-sciences/docs/quickstart).
287
-
:::
288
-
289
-
Create a `nextflow.config` file in the project root directory. The config must specify the following parameters:
290
-
291
-
- Google Life Sciences as Nextflow executor
292
-
- The Docker container image(s) for pipeline tasks
293
-
- The Google Cloud project ID
294
-
- The Google Cloud region or zone where the Compute Engine VMs will be executed.
295
-
You need to specify one or the other, *not* both. Multiple regions or zones can be specified as a comma-separated list, e.g. `google.zone = 'us-central1-f,us-central-1-b'`.
296
-
297
-
Example:
298
-
299
-
```groovy
300
-
process {
301
-
executor = 'google-lifesciences'
302
-
container = 'your/container:latest'
303
-
}
304
-
305
-
google {
306
-
project = 'your-project-id'
307
-
zone = 'europe-west1-b'
308
-
}
309
-
```
310
-
311
-
Notes:
312
-
- A container image must be specified to execute processes. You can use a different Docker image for each process using one or more {ref}`config-process-selectors`.
313
-
- Make sure to specify the project ID, not the project name.
314
-
- Make sure to specify a location where Google Life Sciences is available. Refer to the [Google Cloud documentation](https://cloud.google.com/life-sciences/docs/concepts/locations) for details.
315
-
316
-
Read the {ref}`Google configuration<config-google>` section to learn more about advanced configuration options.
317
-
318
-
### Process definition
319
-
320
-
Processes can be defined as usual and by default the `cpus` and `memory` directives are used to instantiate a custom machine type with the specified compute resources. If `memory` is not specified, 1GB of memory is allocated per cpu. A persistent disk will be created with size corresponding to the `disk` directive. If `disk` is not specified, the instance default is chosen to ensure reasonable I/O performance.
321
-
322
-
The process `machineType` directive may optionally be used to specify a predefined Google Compute Platform [machine type](https://cloud.google.com/compute/docs/machine-types) If specified, this value overrides the `cpus` and `memory` directives. If the `cpus` and `memory` directives are used, the values must comply with the allowed custom machine type [specifications](https://cloud.google.com/compute/docs/instances/creating-instance-with-custom-machine-type#specifications) . Extended memory is not directly supported, however high memory or cpu predefined instances may be utilized using the `machineType` directive
323
-
324
-
Examples:
325
-
326
-
```nextflow
327
-
process custom_resources_task {
328
-
cpus 8
329
-
memory '40 GB'
330
-
disk '200 GB'
331
-
332
-
script:
333
-
"""
334
-
your_command --here
335
-
"""
336
-
}
337
-
338
-
process predefined_resources_task {
339
-
machineType 'n1-highmem-8'
340
-
341
-
script:
342
-
"""
343
-
your_command --here
344
-
"""
345
-
}
346
-
```
347
-
348
-
### Pipeline execution
349
-
350
-
The pipeline can be launched either in a local computer or a cloud instance. Pipeline input data can be stored either locally or in a Google Storage bucket.
351
-
352
-
The pipeline execution must specify a Google Storage bucket where the workflow's intermediate results are stored using the `-work-dir` command line options. For example:
353
-
354
-
```bash
355
-
nextflow run <script or project name> -work-dir gs://my-bucket/some/path
356
-
```
357
-
358
-
:::{tip}
359
-
Any input data *not* stored in a Google Storage bucket will be automatically transferred to the pipeline work bucket. Use this feature with caution, being careful to avoid unnecessary data transfers.
360
-
:::
361
-
362
-
### Preemptible instances
363
-
364
-
Preemptible instances are supported adding the following setting in the Nextflow config file:
365
-
366
-
```groovy
367
-
google {
368
-
lifeSciences.preemptible = true
369
-
}
370
-
```
371
-
372
-
Since this type of virtual machines can be retired by the provider before the job completion, it is advisable to add the following retry strategy to your config file to instruct Nextflow to automatically re-execute a job if the virtual machine was terminated preemptively:
Preemptible instances have a [runtime limit](https://cloud.google.com/compute/docs/instances/preemptible) of 24 hours.
383
-
:::
384
-
385
-
:::{tip}
386
-
For an exhaustive list of error codes, refer to the official Google Life Sciences [documentation](https://cloud.google.com/life-sciences/docs/troubleshooting#error_codes).
387
-
:::
388
-
389
253
### Hybrid execution
390
254
391
-
Nextflow allows the use of multiple executors in the same workflow. This feature enables the deployment of hybrid workloads, in which some jobs are executed in the local computer or local computing cluster, and some jobs are offloaded to Google Cloud (either Google Batch or Google Life Sciences).
255
+
Nextflow allows the use of multiple executors in the same workflow. This feature enables the deployment of hybrid workloads, in which some jobs are executed in the local computer or local computing cluster, and some jobs are offloaded to Google Cloud.
392
256
393
257
To enable this feature, use one or more {ref}`config-process-selectors` in your Nextflow configuration file to apply the Google Cloud executor to the subset of processes that you want to offload. For example:
394
258
395
259
```groovy
396
260
process {
397
261
withLabel: bigTask {
398
-
executor = 'google-batch' // or 'google-lifesciences'
262
+
executor = 'google-batch'
399
263
container = 'my/image:tag'
400
264
}
401
265
}
402
266
403
267
google {
404
268
project = 'your-project-id'
405
269
location = 'us-central1' // for Google Batch
406
-
// zone = 'us-central1-a' // for Google Life Sciences
407
270
}
408
271
```
409
272
@@ -427,16 +290,3 @@ Nextflow will automatically manage the transfer of input and output files betwee
427
290
428
291
- Currently, it's not possible to specify a disk type different from the default one assigned by the service depending on the chosen instance type.
429
292
430
-
### Troubleshooting
431
-
432
-
- Make sure to enable the Compute Engine API, Life Sciences API and Cloud Storage API in the [APIs & Services Dashboard](https://console.cloud.google.com/apis/dashboard) page.
433
-
434
-
- Make sure to have enough compute resources to run your pipeline in your project [Quotas](https://console.cloud.google.com/iam-admin/quotas) (i.e. Compute Engine CPUs, Compute Engine Persistent Disk, Compute Engine In-use IP addresses, etc).
435
-
436
-
- Make sure your security credentials allow you to access any Google Storage bucket where input data and temporary files are stored.
437
-
438
-
- When a job fails, you can check the `google/` directory in the task work directory (in the bucket storage), which contains useful information about the job execution. To enable the creation of this directory, set `google.lifeSciences.debug = true` in the Nextflow config.
439
-
440
-
- You can enable the optional SSH daemon in the job VM by setting `google.lifeSciences.sshDaemon = true` in the Nextflow config.
441
-
442
-
- Make sure you are choosing a `location` where the [Cloud Life Sciences API is available](https://cloud.google.com/life-sciences/docs/concepts/locations), and a `region` or `zone` where the [Compute Engine API is available](https://cloud.google.com/compute/docs/regions-zones/).
Copy file name to clipboardExpand all lines: docs/reference/config.md
+1-47Lines changed: 1 addition & 47 deletions
Original file line number
Diff line number
Diff line change
@@ -816,7 +816,7 @@ The following settings are available:
816
816
817
817
## `google`
818
818
819
-
The `google` scope allows you to configure the interactions with Google Cloud, including Google Cloud Batch, Google Life Sciences, and Google Cloud Storage.
819
+
The `google` scope allows you to configure the interactions with Google Cloud, including Google Cloud Batch and Google Cloud Storage.
820
820
821
821
Read the {ref}`google-page` page for more information.
822
822
@@ -956,52 +956,6 @@ The following settings are available for Cloud Life Sciences:
956
956
`google.zone`
957
957
: The Google Cloud zone where jobs are executed. Multiple zones can be provided as a comma-separated list. Cannot be used with the `google.region` option. See the [Google Cloud documentation](https://cloud.google.com/compute/docs/regions-zones/) for a list of available regions and zones.
958
958
959
-
`google.lifeSciences.bootDiskSize`
960
-
: Set the size of the virtual machine boot disk e.g `50.GB` (default: none).
961
-
962
-
`google.lifeSciences.copyImage`
963
-
: The container image run to copy input and output files. It must include the `gsutil` tool (default: `google/cloud-sdk:alpine`).
964
-
965
-
`google.lifeSciences.cpuPlatform`
966
-
: Set the minimum CPU Platform e.g. `'Intel Skylake'`. See [Specifying a minimum CPU Platform for VM instances](https://cloud.google.com/compute/docs/instances/specify-min-cpu-platform#specifications) (default: none).
967
-
968
-
`google.lifeSciences.debug`
969
-
: When `true` copies the `/google` debug directory in that task bucket directory (default: `false`).
970
-
971
-
`google.lifeSciences.keepAliveOnFailure`
972
-
: :::{versionadded} 21.06.0-edge
973
-
:::
974
-
: When `true` and a task complete with an unexpected exit status the associated compute node is kept up for 1 hour. This options implies `sshDaemon=true` (default: `false`).
975
-
976
-
`google.lifeSciences.network`
977
-
: :::{versionadded} 21.03.0-edge
978
-
:::
979
-
: Set network name to attach the VM's network interface to. The value will be prefixed with `global/networks/` unless it contains a `/`, in which case it is assumed to be a fully specified network resource URL. If unspecified, the global default network is used.
980
-
981
-
`google.lifeSciences.preemptible`
982
-
: When `true` enables the usage of *preemptible* virtual machines or `false` otherwise (default: `true`).
983
-
984
-
`google.lifeSciences.serviceAccountEmail`
985
-
: :::{versionadded} 20.05.0-edge
986
-
:::
987
-
: Define the Google service account email to use for the pipeline execution. If not specified, the default Compute Engine service account for the project will be used.
988
-
989
-
`google.lifeSciences.subnetwork`
990
-
: :::{versionadded} 21.03.0-edge
991
-
:::
992
-
: Define the name of the subnetwork to attach the instance to must be specified here, when the specified network is configured for custom subnet creation. The value is prefixed with `regions/subnetworks/` unless it contains a `/`, in which case it is assumed to be a fully specified subnetwork resource URL.
993
-
994
-
`google.lifeSciences.sshDaemon`
995
-
: When `true` runs SSH daemon in the VM carrying out the job to which it's possible to connect for debugging purposes (default: `false`).
996
-
997
-
`google.lifeSciences.sshImage`
998
-
: The container image used to run the SSH daemon (default: `gcr.io/cloud-genomics-pipelines/tools`).
999
-
1000
-
`google.lifeSciences.usePrivateAddress`
1001
-
: :::{versionadded} 20.03.0-edge
1002
-
:::
1003
-
: When `true` the VM will NOT be provided with a public IP address, and only contain an internal IP. If this option is enabled, the associated job can only load docker images from Google Container Registry, and the job executable cannot use external services other than Google APIs (default: `false`).
@@ -821,7 +820,7 @@ See also: [resourceLabels](#resourcelabels)
821
820
:::{versionadded} 19.07.0
822
821
:::
823
822
824
-
The `machineType` can be used to specify a predefined Google Compute Platform [machine type](https://cloud.google.com/compute/docs/machine-types) when running using the {ref}`Google Batch <google-batch-executor>` or {ref}`Google Life Sciences <google-lifesciences-executor>` executor, or when using the autopools feature of the {ref}`Azure Batch executor<azurebatch-executor>`.
823
+
The `machineType` can be used to specify a predefined Google Compute Platform [machine type](https://cloud.google.com/compute/docs/machine-types) when running using the {ref}`Google Batch <google-batch-executor>`, or when using the autopools feature of the {ref}`Azure Batch executor<azurebatch-executor>`.
825
824
826
825
This directive is optional and if specified overrides the cpus and memory directives:
827
826
@@ -1379,7 +1378,6 @@ Resource labels are currently supported by the following executors:
0 commit comments