This is a lightweight proof-of-concept tool for summarising the expression data on a gene page from VEuPathDB databases using OpenAI's GPT-4o model or Anthropic's Claude 4 Sonnet.
- API key for your chosen model:
- OpenAI: add
OPENAI_API_KEY=xxxxxxxxxxxxx
to a file called.env
in this directory - Anthropic: add
ANTHROPIC_API_KEY=xxxxxxxxxxxxx
to a file called.env
in this directory
- OpenAI: add
- volta if possible: https://docs.volta.sh/guide/getting-started
- it takes care of your node and yarn versions
- node - 18.20.5 tested (higher versions will likely work)
- yarn - 1.22.19 tested
yarn
If you have volta
installed, this will also make sure you have the right versions of node
and yarn
, otherwise you'll need to install those manually if you run into any issues.
yarn build
This compiles src/main.ts
into dist/main.js
You can run the script with any gene ID from supported VEuPathDB databases:
With OpenAI GPT-4o (default):
node dist/main.js PlasmoDB PF3D7_1016300
With Claude 4 Sonnet:
node dist/main.js PlasmoDB PF3D7_1016300 --claude
Supported databases: PlasmoDB, VectorBase, ToxoDB, CryptoDB, FungiDB, GiardiaDB, TrichDB, AmoebaDB, MicrosporidiaDB, PiroplasmaDB, TriTrypDB
Note: Use node dist/main.js
directly instead of yarn start
when using the --claude
flag, as npm/yarn scripts don't pass through additional arguments.
It will output three files in the example-output
directory:
GENE_ID.01.MODEL.summaries.json
- the per experiment AI summaries (JSON)GENE_ID.01.MODEL.summary.json
- the AI summary-of-summaries and grouping (JSON)GENE_ID.01.MODEL.summary.html
- a nice HTML version of the summary
Where MODEL
is either OpenAI
or Claude
depending on which API you used.
To view the HTML open it as a local file in your web browser (Ctrl-O usually).
You can commit any generated files to the repo if you like (within reason)!
- OpenAI API key
- add
OPENAI_API_KEY=xxxxxxxxxxxxx
to a file called.env
in this directory
- add
- Docker installed on your system
To build the Docker image, use the following command:
docker build -t expression-shepherd .
To start a container from the image and get a shell.
The command below "mounts" ./example-output inside the container so any outputs will be seen in the host filesystem too.
docker run -d --rm --env-file .env -v $(pwd)/example-output:/app/example-output expression-shepherd sh
The container will be removed when you exit the shell. (But not the image.)
If the container is already running but you need a new shell:
docker ps
# find the CONTAINER_ID
docker exec -it --env-file .env <CONTAINER_ID> sh
You can then manually run the script (see step 3. in the non-Docker section above):
node dist/main.js PlasmoDB PF3D7_0818900
Or you can just run the script at container launch time:
docker run -d --rm --env-file .env -v $(pwd)/example-output:/app/example-output expression-shepherd node dist/main.js PlasmoDB PF3D7_0818900
Or like this in an already running container:
docker exec -it --env-file .env <CONTAINER_ID> node dist/main.js PlasmoDB PF3D7_0818900
Note that volta is not available in the node container but it does have suitable versions of node and yarn installed anyway.