- https://microsoft.github.io/Data-Science-For-Beginners/#/
- https://microsoft.github.io/ML-For-Beginners/#/
- https://microsoft.github.io/AI-For-Beginners/
data inference model pipelines
https://blog.bioconductor.org/posts/2022-10-22-awesome-lists/
https://www.bioconductor.org/packages/release/BiocViews.html
https://www.researchobject.org/overview/
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0309210
https://www.nature.com/articles/533452a
- Open-source
- Hashicorp
- Ubuntu (Canonical)
Truly integrate DSO https://boehringer-ingelheim.github.io/dso/tutorials/getting_started.html
prj tool (projectable) https://github.com/dzfrias/projectable
R has comprehensive bioconductor and https://github.com/erikgahner/awesome-ggplot2 + ggbio
https://github.com/addypy/datagovindia/ https://www.re3data.org/browse/by-country/ https://github.com/awesomedata/awesome-public-datasets https://github.com/public-apis/public-apis https://free-apis.github.io/#/ https://github.com/datasets/awesome-data?tab=readme-ov-file https://datacatalogs.org https://dados.gov.br/home https://ckan.org/features https://github.com/GetDKAN/dkan https://queridodiario.ok.org.br https://magda.io https://dev.magda.io/search?page=2
Automations (via pixi + just) for installing baseline tools (Python + Java + Babashka + binaries eget, dust, duf)
https://github.com/OpenLineage/OpenLineage?tab=readme-ov-file https://egeria-project.org/education/ https://github.com/grai-io/grai-core https://www.grai.io
- https://github.com/theodi/data-publish-list
- https://learn.scds.ca/rdm-best-practices/topics/4-publishing.html
- https://www.springernature.com/gp/authors/research-data-policy/generalist-repositories/12327166
- https://ieee-dataport.org/
- https://github.com/ScilifelabDataCentre/open-science-checklists
- https://www.fairdata.fi/en/data-management-checklist/
- https://github.com/fairdataihub/FAIRshare
- https://www.go-fair.org/fair-principles/
- https://au-research.github.io/FAIR-data-101-training/resources/additional
- https://fair-edna.github.io/next.html
- https://faircookbook.elixir-europe.org/content/recipes/accessibility/aspera.html
- https://ena-docs.readthedocs.io/en/latest/retrieval/file-download.html https://programmerall.com/article/8629309388/ https://ftp.ncbi.nlm.nih.gov/;9u;9u
[](https://opensource.org/licenses/MIT) [](https://github.com/copier-org/copier) [](https://github.com/pre-commit/pre-commit)
This is a template built with [Copier](https://github.com/copier-org/copier) to generate a data science focused python project.
Get started with the following command:
“`shell copier copy gh:abhi18av/template-analysis-and-writeup path/to/destination “`
## Features
### Core ideas
Data and Code Analysis and Writeup Clojure and Quarto Emacs and VSCode Users and Engineers
### Tools used in this template
- Task runner - `just`
- Data folders
- data dictionaries
- raw
- processed
- Programming languages and libraries
- R
- Python
- Clojure(Script)
- babashka/nbb
- Java(jshell)
- Nushell
- Bash
- Wolfram
- OCaml
- Notebooks
- Quarto (R, Python, ObservableJS)
- Mathematica
- Matlab
- Dashboards
- Quarto (R, Python, ObservableJS)
- Pipeline runner - `nextflow`
- Package and environment management
- Pixi
- Renv
- Pip
- Clojure-CLI
- NPM
- Code and data version management
- Git
- Fossil
- Data Version Control
- Data transfer and backup
- Rclone
- Restic
- ArtiVC
- Writeup management (Manuscript, Report, Presentation)
- Quarto
- Typst
- Org-mode
- Infrastructure management (MINIO)
- Terraform
- Dagger
- Nomad cluster
- MicroK8s
- Juju
- Project-level bin folder, pbin
- Utilities for editor, env management config
- .vscode
- .editorconfig
- .envrc
- pre-commit hooks
- Project management
- ORG files (meetings, experiments)
### Project structure
It is assumed that most of the work will be done in Jupyter Notebooks. However, the template also includes a python project, in which you can put functions and classes shared across notebooks. The repository is set up to use [Pytest](https://docs.pytest.org/en/stable/) for unit testing this module code.
The template also includes a `data` directory whose contents will be ignored by git. You can use this folder to store data that you do not commit. You may also put a readme file in which you can document the source datasets you use and how to acquire them.
### [just](https://github.com/casey/just)
`just` is a command runner that allows you to easily to run project-specific commands. In fact, you can use `just` to run all the setup commands listed below:
“`shell just setup “`
### [pre-commit](https://github.com/pre-commit/pre-commit)
pre-commit is a tool that runs checks on your files before you commit them with git, thereby helping ensure code quality. Enable it with the following command:
“`shell pre-commit install –install-hooks “`
The configuration is stored in `.pre-commit-config.yaml`.
### Github Actions
You may optionally add a github workflow file which checks the following:
- uses ruff to check files are formatted and linted
- Runs unit tests and checks coverage
- Checks any markdown files are formatted with [markdownlint-cli2](https://github.com/DavidAnson/markdownlint-cli2)
- Checks that all jupyter notebooks are clean
### [Typos](https://github.com/crate-ci/typos)
Typos checks for common typos in code, aiming for a low false positive rate. The repository is configured not to use it for Jupyter notebook files, as it tends to find errors in cell outputs.
Test with [Copier](https://github.com/copier-org/copier) and [copier-template-tester](https://github.com/KyleKing/copier-template-tester).