You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+59-48Lines changed: 59 additions & 48 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,14 +9,16 @@ The SCP Ingest Pipeline is an ETL pipeline for single-cell RNA-seq data.
9
9
10
10
# Prerequisites
11
11
12
-
- Python 3.7+
12
+
- Python 3.10
13
13
- Google Cloud Platform project
14
14
- Suitable service account (SA) and MongoDB VM in GCP. SA needs roles "Editor", "Genomics Pipelines Runner", and "Storage Object Admin". Broad Institute engineers: see instructions [here](https://github.com/broadinstitute/single_cell_portal_configs/tree/master/terraform-mongodb).
15
15
- SAMtools, if using `ingest/make_toy_data.py`
16
16
- Tabix, if using `ingest/genomes/genomes_pipeline.py`
17
17
18
18
# Install
19
19
20
+
### Native
21
+
20
22
Fetch the code, boot your virtualenv, install dependencies:
21
23
22
24
```
@@ -25,68 +27,41 @@ cd scp-ingest-pipeline
25
27
python3 -m venv env --copies
26
28
source env/bin/activate
27
29
pip install -r requirements.txt
30
+
scripts/setup-mongo-dev.sh <PATH_TO_YOUR_VAULT_TOKEN> # E.g. ~/.github-token
28
31
```
29
32
30
-
To use `ingest/make_toy_data.py`:
33
+
### Docker
31
34
32
-
```
33
-
brew install samtools
34
-
```
35
-
36
-
To use `ingest/genomes/genomes_pipeline.py`:
35
+
With Docker running and Vault active on your local machine, run:
37
36
38
37
```
39
-
brew install tabix
38
+
scripts/docker-compose-setup.sh -t <PATH_TO_YOUR_VAULT_TOKEN> # E.g. ~/.github-token
40
39
```
41
40
42
-
Now get secrets from Vault to set environment variables needed to write to the database:
43
-
(see also scripts/setup_mongo_dev.sh)
41
+
If on Apple silicon Mac (e.g. M1), and performance seems poor, consider generating a docker image using the arm64 base. Example test image: gcr.io/broad-singlecellportal-staging/single-cell-portal:development-2.2.0-arm64, usage:
44
42
45
43
```
46
-
export BROAD_USER="<username in your email address>"
Be sure to `unset SENTRY_DSN` when your updates are done, so development logs are not always sent to Sentry.
89
-
90
65
## Git hooks
91
66
92
67
After installing Ingest Pipeline, add Git hooks to help ensure code quality:
@@ -106,7 +81,7 @@ In rare cases, you might need to skip Git hooks, like so:
106
81
107
82
# Test
108
83
109
-
After [installing](#Install):
84
+
After [installing](#install):
110
85
111
86
```
112
87
source env/bin/activate
@@ -128,49 +103,83 @@ pytest test_ingest.py
128
103
# Run all tests, show code coverage metrics
129
104
pytest --cov=../ingest/
130
105
```
106
+
131
107
For more, see <https://docs.pytest.org/en/stable/usage.html>.
132
108
133
-
## Testing in Docker
134
-
If you have difficulties installing and configuring `scp-ingest-pipeline` due to hardware issues (e.g. Mac M1 chips),
135
-
you can alternatively test locally by building the Docker image and then running any commands inside the container.
109
+
## Testing in Docker locally
110
+
<!--
111
+
Step 1 is also useful for troubleshooting when Dockerfile updates fail to build
112
+
-->
113
+
If you have difficulties installing and configuring `scp-ingest-pipeline` due to hardware issues (e.g. Mac M1 chips),
114
+
you can alternatively test locally by building the Docker image and then running any commands inside the container.
136
115
There are some extra steps required, but this sidesteps the need to install packages locally.
137
116
138
117
### 1. Build the image
139
-
Run the following command to build the testing Docker image locally (make sure Docker is running first):
118
+
119
+
Run the following command to build the testing Docker image locally (make sure Docker is running first). This build command will incorporate any changes in the local instance of your repo, committed or not:
Note - if this is your first time doing `docker build` you may need to configure Docker to use the Google Cloud CLI to authenticate requests to Container Registry:
126
+
127
+
```
128
+
gcloud auth configure-docker
129
+
```
130
+
131
+
Pro-Tip: For local builds, you can try adding docker build options `--progress=plain` (for more verbose build info) and/or `--no-cache` (when you want to ensure a build with NO cached layers)
132
+
143
133
### 2. Set up environment variables
134
+
144
135
Run the following to pull database-specific secrets out of vault (passing in the path to your vault token):
> WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
172
+
168
173
### 5. Copy keyfile to running container
174
+
169
175
In a separate terminal window, copy the JSON keyfile from above to the expected location:
176
+
170
177
```
171
178
docker cp /tmp/keyfile.json scp-ingest-test:/tmp
172
179
```
180
+
173
181
You can now run any `ingest_pipeline.py` command you wish inside the container.
182
+
174
183
# Use
175
184
176
185
Run this every time you start a new terminal to work on this project:
@@ -184,10 +193,12 @@ See [`ingest_pipeline.py`](https://github.com/broadinstitute/scp-ingest-pipeline
184
193
## Troubleshooting during set up
185
194
186
195
If you run into an error like: "... [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed ... " try:
196
+
187
197
- Open terminal
188
198
-`cd` to where python is installed
189
199
- Run the certificates command with `/Applications/Python\ < Your Version of Python Here >/Install\ Certificates.command`
190
200
191
201
If you run into an error like "ModuleNotFoundError: No module named 'google'" try:
202
+
192
203
- Open terminal
193
204
- Run `pip install --upgrade google-api-python-client`
0 commit comments