CowPi WorkFlow Utility

Tool used to aid the use of the CowPi tool for analysing 16S data. Run in Python 2. This can be run on multiple independent datasets (completely separate datasets, only thing that is together is production of a summary file of the stages), or a single dataset.

To create conda environment to run this tool:

conda create -n cowpi_workflow picrust=1.1.4 biom-format=2.1.7 vsearch=2.18.0 pyyaml=5.2 pandas=0.24.2 python=2.7.18 Please note: Sometimes this may have issues with conda forge, removing the conda forge channel whilst producing this environment should fix this.

Contains following scripts:

'cowpi_get_and_prep_data.sh'

Script gets the cowpi data from the Zenodo database, and also gunzips them and edits the fasta file to work with vsearch. yaml configuration file is produced (containing the paths to each required file for CowPi downloaded), which can be used with (after choosing configurations) main Cowpi workflow script. Argument one: path to produce yaml configurations file. Argument two: output directory to obtain the CowPi files. Argument three: is the data you're going to be using preclustered? (Y or N)

'filter_archaea.py' (optional) Python script which filters archaea from the initial cowpi data files. Optional if not using archaea data. Argument one: archaea file list (see 'archaea_file.txt') path Argument Two: The initial CowPi precalculated files directory (produced above)
'16s_function_workflow.py' Main workflow python script. Takes the yaml configuration file as the first argument, then runs each step of the CowPi workflow and generates functional pathway results, as well as producing a summary file of each step. Argument one: yaml configuration file
'conversion_of_read_names.py' (optional) Depending on headers of fasta sequence file, this can result in problems in output from vsearch stage of cowpi. This adds the file name into the headers and checks for any "."s in file sample name which can cause issues.

Process:

Run "cowpi_get_and_prep_data.sh". This will download and prepare the cowpi precalculated data, as well as produce a yaml file with the paths to the cowpi precalculateds.
Run "conversion_of_read_names.py". this is probably optional depending on the name of fastq sequence headers. This is only for data not preclustered at the moment.
Fill in configurations of the yaml file. If running on a single dataset, point the directory_of_datasets path to the directory containing that dataset. If multiple, then directory containing sub directories of that datasets.
Run "16s_function_workflow.py" with the argument pointing to the yaml file.

Important Outputs (in each dataset directory)

"summary_output_file.txt" - gives various information about results of each stage.
"categoried_predictions" - directory containing collapsed pathways at each kegg ortholog.
"metagenome_predictions/metagenome_predictions.tsv" - tabulated file containing counts of each ortholog.

If multiple datasets option chosen, "all_datasets_summary_file.csv" file is also produced. This takes the summary output file from each dataset analysis and puts into a simpler table.

If there are any issues, please contact me at [email protected] or open up an issue. Thanks.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
tests		tests
README.md		README.md
archaea_names.txt		archaea_names.txt
conversion_of_read_names.py		conversion_of_read_names.py
cowpi_get_and_prep_data.sh		cowpi_get_and_prep_data.sh
cowpi_main_workflow.py		cowpi_main_workflow.py
example_configurations.yaml		example_configurations.yaml
filter_archaea.py		filter_archaea.py
preclustered_configurations_example.yaml		preclustered_configurations_example.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CowPi WorkFlow Utility

About

Releases

Packages

Languages

lobrien20/CowPi_workflow_utility

Folders and files

Latest commit

History

Repository files navigation

CowPi WorkFlow Utility

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages