Example code that demonstrates how to store, process, and query genomic and biological datasets using AWS HealthOmics
AWS HealthOmics helps healthcare and life sciences customers store, query, analyze, and generate insights from genomic and other biological data to improve human health.
This repository contains resources (e.g. code scripts, jupyter notebooks, etc) that demonstrate the usage of AWS HealthOmics.
The quickest setup to run example notebooks includes:
- An AWS account
- Proper IAM User and Role setup
- An Amazon SageMaker Notebook Instance
- Rapidly generate and run and analyze workflows using natural langauge prompts and Amazon Q Developer CLI.
- Using Agentic AI and MCP for natural language interactions with AWS HealthOmics.
- Using HealthOmics Storage with genomics references and readsets: Get acquainted with HealthOmics storage by creating reference and sequence stores, importing data from FASTQ and CRAM files, and downloading readsets.
- Running WDL and Nextflow pipelines with HealthOmics Workflows: Learn how to create, run, and debug WDL and Nextflow based pipelines that process data from HealthOmics Storage and Amazon S3 using HealthOmics Workflows.
- Create S3Table bucket tables and import VCF and GVCF files: Setup S3Table bucket tables to hold variant data and import variant data into a iceberg data warehouse suitable for integration with AWS analytics tools such as Athena, Redshift, EMR and others.
This library is licensed under the Apache 2.0 License. For more details, please take a look at the LICENSE file.
See the Security issue notifications section of our contributing guidelines for more information.
Although we're extremely excited to receive contributions from the community, we're still working on the best mechanism to take in examples from external sources. Please bear with us in the short-term if pull requests take longer than expected or are closed. Please read our contributing guidelines if you'd like to open an issue or submit a pull request.