Skip to content

Commit

Permalink
Merge pull request #116 from pachterlab/anhchi172-patch-2
Browse files Browse the repository at this point in the history
Update elm.md
  • Loading branch information
lauraluebbert authored Jan 5, 2024
2 parents a961add + 3672387 commit b1fe030
Show file tree
Hide file tree
Showing 8 changed files with 40 additions and 18 deletions.
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,12 @@

`gget` is a free, open-source command-line tool and Python package that enables efficient querying of genomic databases. `gget` consists of a collection of separate but interoperable modules, each designed to facilitate one type of database querying in a single line of code.

```diff
! While Ensembl is in the process of updating its database to a new release,
! you might receive a 404 error from the gget search and ref modules.
! If this is the case, specify an earlier Ensembl release using the 'release' argument.
! Example: 'gget ref -r 110 human' (Python: 'gget.ref("human", release=110)')
```

![alt text](https://github.com/pachterlab/gget/blob/main/figures/gget_overview.png?raw=true)

Expand Down
12 changes: 6 additions & 6 deletions docs/src/en/elm.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
> Python arguments are equivalent to long-option arguments (`--arg`), unless otherwise specified. Flags are True/False arguments in Python. The manual for any gget tool can be called from the command-line using the `-h` `--help` flag.
## gget elm 🎭
Locally predict Eukaryotic Linear Motifs from an amino acid sequence or UniProt ID using data from the [ELM database](http://elm.eu.org/). ELM data can be downloaded & distributed for non-commercial use according to the [ELM Software License Agreement](http://elm.eu.org/media/Elm_academic_license.pdf).
Locally predict Eukaryotic Linear Motifs from an amino acid sequence or UniProt Acc using data from the [ELM database](http://elm.eu.org/). ELM data can be downloaded & distributed for non-commercial use according to the [ELM Software License Agreement](http://elm.eu.org/media/Elm_academic_license.pdf).
Return format: JSON (command-line) or data frame/CSV (Python). This module returns two data frames (or JSON formatted files) (see examples).

Before using `gget elm` for the first time, run `gget setup elm` / `gget.setup("elm")` once (also see [`gget setup`](setup.md)).

**Positional argument**
`sequence`
Amino acid sequence or Uniprot ID (str).
When providing a Uniprot ID, use flag `--uniprot` (Python: `uniprot==True`).
Amino acid sequence or Uniprot Acc (str).
When providing a Uniprot Acc, use flag `--uniprot` (Python: `uniprot==True`).

**Optional arguments**
`-s` `--sensitivity`
Expand All @@ -26,7 +26,7 @@ Path to the folder to save results in (str), e.g. "path/to/directory". Default:

**Flags**
`-u` `--uniprot`
Set to True if `sequence` is a Uniprot ID instead of an amino acid sequence.
Set to True if `sequence` is a Uniprot Acc instead of an amino acid sequence.

`-e` `--expand`
Expand the information returned in the regex data frame to include the protein names, organisms, and references that the motif was orignally validated on.
Expand All @@ -51,7 +51,7 @@ gget.setup(“elm”) # Downloads/updates local ELM database
ortholog_df, regex_df = gget.elm("LIAQSIGQASFV")
```

Find ELMs giving a UniProt ID as input:
Find ELMs giving a UniProt Acc as input:
```bash
gget setup elm # Downloads/updates local ELM database
gget elm -o gget_elm_results --uniprot Q02410 -e
Expand All @@ -65,7 +65,7 @@ ortholog_df, regex_df = gget.elm("Q02410", uniprot=True, expand=True)

ortholog_df:

|Ortholog_UniProt_ID|ProteinName|class_accession|ELMIdentifier |FunctionalSiteName |Description |Organism ||
|Ortholog_UniProt_Acc|ProteinName|class_accession|ELMIdentifier |FunctionalSiteName |Description |Organism ||
|:-----------------:|:---------:|:-------------:|:-------------:|:-----------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------:|:----------:|:-:|
|Q02410 |APBA1_HUMAN|ELME000357 |LIG_CaMK_CASK_1|CASK CaMK domain binding ligand motif|Motif that mediates binding to the calmodulin-dependent protein kinase (CaMK) domain of the peripheral plasma membrane protein CASK/Lin2.|Homo sapiens||
|Q02410 |APBA1_HUMAN|ELME000091 |LIG_PDZ_Class_2|PDZ domain ligands |The C-terminal class 2 PDZ-binding motif is classically represented by a pattern such as |Homo sapiens||
Expand Down
4 changes: 4 additions & 0 deletions docs/src/en/ref.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
Fetch FTPs and their respective metadata (or use flag `ftp` to only return the links) for reference genomes and annotations from [Ensembl](https://www.ensembl.org/) by species.
Return format: dictionary/JSON.

**While Ensembl is in the process of updating its database to a new release, you might receive a 404 ERROR.**
If this is the case, specify an earlier Ensembl version using the `release` argument.
Example: `gget ref -r 110 human` (Python: `gget.ref("human", release=110)`)

**Positional argument**
`species`
Species for which the FTPs will be fetched in the format genus_species, e.g. homo_sapiens.
Expand Down
6 changes: 5 additions & 1 deletion docs/src/en/search.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,11 @@ Fetch genes and transcripts from [Ensembl](https://www.ensembl.org/) using free-
Results are matched based on the "gene name" and "description" sections in the Ensembl database. `gget` version >= 0.27.9 also includes results that match the Ensembl "synonym" section.
Return format: JSON (command-line) or data frame/CSV (Python).

**Positional argument**
**While Ensembl is in the process of updating its database to a new release, you might receive a 404 ERROR.**
If this is the case, specify an earlier Ensembl version using the `release` argument.
Example: `gget search -r 110 -s human ace2` (Python: `gget.search("ace2", species="human", release=110)`)

**Positional argument**
`searchwords`
One or more free form search words, e.g. gaba nmda. (Note: Search is not case-sensitive.)

Expand Down
4 changes: 4 additions & 0 deletions docs/src/es/ref.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
Obtenga enlaces FTP y sus respectivos metadatos (o use la bandera `ftp` para regresar solo los enlaces) para referenciar genomas y anotaciones de [Ensembl](https://www.ensembl.org/).
Regresa: Resultados en formato JSON.

**Mientras Ensembl está en el proceso de actualizar su base de datos a una nueva versión, es posible que reciba un ERROR 404.**
Si este es el caso, especifique una versión anterior de Ensembl usando el argumento `release`.
Ejemplo: `gget ref -r 110 human` (Python: `gget.ref("human", release=110)`)

**Parámetro posicional**
`species`
La especie por la cual que se buscará los FTP en el formato género_especies, p. ej. homo_sapiens.
Expand Down
4 changes: 4 additions & 0 deletions docs/src/es/search.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ Obtenga genes y transcripciones de [Ensembl](https://www.ensembl.org/) usando t
Los resultados se comparan según las secciones "nombre del gen" y "descripción" en la base de datos de Ensembl. `gget` versión >= 0.27.9 también incluye resultados que coinciden con la sección "sinónimo" de Ensembl.
Regresa: Resultados en formato JSON (Terminal) o Dataframe/CSV (Python).

**Mientras Ensembl está en el proceso de actualizar su base de datos a una nueva versión, es posible que reciba un ERROR 404.**
Si este es el caso, especifique una versión anterior de Ensembl usando el argumento `release`.
Ejemplo: `gget search -r 110 -s human ace2` (Python: `gget.search("ace2", species="human", release=110)`)

**Parámetro posicional**
`searchwords`
Una o más palabras de búsqueda de forma libre, p. ej. gaba nmda. (Nota: la búsqueda no distingue entre mayúsculas y minúsculas).
Expand Down
4 changes: 2 additions & 2 deletions tests/fixtures.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Latest Ensembl release for unittests
LATEST_ENS_RELEASE = 110
LATEST_ENS_RELEASE = 111

# gget search species options for Ensembl release 106
SPECIES_OPTIONS = [
Expand Down Expand Up @@ -629,4 +629,4 @@
"zalophus_californianus",
"zonotrichia_albicollis",
"zosterops_lateralis_melanops",
]
]
18 changes: 9 additions & 9 deletions tests/test_info.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,15 +71,15 @@ def test_info_exon(self):

self.assertListEqual(result_to_test, expected_result)

def test_info_pdb(self):
test = "test9"
expected_result = info_dict[test]["expected_result"]
result_to_test = info(**info_dict[test]["args"])
# If result is a DataFrame, convert to list
if isinstance(result_to_test, pd.DataFrame):
result_to_test = result_to_test.dropna(axis=1).values.tolist()

self.assertListEqual(result_to_test, expected_result)
# def test_info_pdb(self):
# test = "test9"
# expected_result = info_dict[test]["expected_result"]
# result_to_test = info(**info_dict[test]["args"])
# # If result is a DataFrame, convert to list
# if isinstance(result_to_test, pd.DataFrame):
# result_to_test = result_to_test.dropna(axis=1).values.tolist()

# self.assertListEqual(result_to_test, expected_result)

def test_info_ncbifalse_uniprottrue(self):
test = "test10"
Expand Down

0 comments on commit b1fe030

Please sign in to comment.