Skip to content

VEuPathDB/expression-shepherd

Repository files navigation

Expression Shepherd

This is a lightweight proof-of-concept tool for summarising the expression data on a gene page from VEuPathDB databases using OpenAI's GPT-4o model or Anthropic's Claude 4 Sonnet.

Non-Docker usage

0. Requirements

  • API key for your chosen model:
    • OpenAI: add OPENAI_API_KEY=xxxxxxxxxxxxx to a file called .env in this directory
    • Anthropic: add ANTHROPIC_API_KEY=xxxxxxxxxxxxx to a file called .env in this directory
  • volta if possible: https://docs.volta.sh/guide/getting-started
    • it takes care of your node and yarn versions
  • node - 18.20.5 tested (higher versions will likely work)
  • yarn - 1.22.19 tested

1. Initialise the Node.js environment

yarn

If you have volta installed, this will also make sure you have the right versions of node and yarn, otherwise you'll need to install those manually if you run into any issues.

2. Compile the TypeScript

yarn build

This compiles src/main.ts into dist/main.js

3. Run the code

You can run the script with any gene ID from supported VEuPathDB databases:

With OpenAI GPT-4o (default):

node dist/main.js PlasmoDB PF3D7_1016300

With Claude 4 Sonnet:

node dist/main.js PlasmoDB PF3D7_1016300 --claude

Supported databases: PlasmoDB, VectorBase, ToxoDB, CryptoDB, FungiDB, GiardiaDB, TrichDB, AmoebaDB, MicrosporidiaDB, PiroplasmaDB, TriTrypDB

Note: Use node dist/main.js directly instead of yarn start when using the --claude flag, as npm/yarn scripts don't pass through additional arguments.

It will output three files in the example-output directory:

  1. GENE_ID.01.MODEL.summaries.json - the per experiment AI summaries (JSON)
  2. GENE_ID.01.MODEL.summary.json - the AI summary-of-summaries and grouping (JSON)
  3. GENE_ID.01.MODEL.summary.html - a nice HTML version of the summary

Where MODEL is either OpenAI or Claude depending on which API you used.

To view the HTML open it as a local file in your web browser (Ctrl-O usually).

You can commit any generated files to the repo if you like (within reason)!

Docker usage

0. Requirements

  • OpenAI API key
    • add OPENAI_API_KEY=xxxxxxxxxxxxx to a file called .env in this directory
  • Docker installed on your system

1. Build the Docker image

To build the Docker image, use the following command:

docker build -t expression-shepherd .

2. Run the container

To start a container from the image and get a shell.

The command below "mounts" ./example-output inside the container so any outputs will be seen in the host filesystem too.

docker run -d --rm --env-file .env -v $(pwd)/example-output:/app/example-output expression-shepherd sh

The container will be removed when you exit the shell. (But not the image.)

3. Run the code

If the container is already running but you need a new shell:

docker ps
# find the CONTAINER_ID
docker exec -it --env-file .env <CONTAINER_ID> sh

You can then manually run the script (see step 3. in the non-Docker section above):

node dist/main.js PlasmoDB PF3D7_0818900

Or you can just run the script at container launch time:

docker run -d --rm --env-file .env -v $(pwd)/example-output:/app/example-output expression-shepherd node dist/main.js PlasmoDB PF3D7_0818900

Or like this in an already running container:

docker exec -it --env-file .env <CONTAINER_ID> node dist/main.js PlasmoDB PF3D7_0818900

Note that volta is not available in the node container but it does have suitable versions of node and yarn installed anyway.

About

Wrangling our expression data with gpt-4o

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages