This is a template built with Copier to generate a data science focused python project.
Get started with the following command:
copier copy gh:felixgwilliams/python-copier-template-ds path/to/destinationIt is assumed that most of the work will be done in Jupyter Notebooks. However, the template also includes a python project, in which you can put functions and classes shared across notebooks. The repository is set up to use Pytest for unit testing this module code.
The template also includes a data directory whose contents will be ignored by git.
You can use this folder to store data that you do not commit.
You may also put a readme file in which you can document the source datasets you use and how to acquire them.
just is a command runner that allows you to easily to run project-specific commands.
In fact, you can use just to run all the setup commands listed below:
just setupThe repository is set up to use uv for package or project management. You may set up your python environment with
uv sync --all-groups --all-extrasThe repository is configured to use Ruff for linting and formatting. By default, all lints are enabled except
- COMEnforces trailing commas
- ERADisallows commented-out code
- ISC001(conflicts with the formatter).
- PTHRequires use of pathlib
- TRY003disallow exception messages as string literals
- RUF002disallow ambiguous characters in docstrings
- RUF003disallow ambiguous characters in docstrings
- PLC0415disallow import outside top level
- PDopinionated linting for- pandascode
Out of the PD rules, the following are reenabled:
- PD002disallow- inplace=True
- PD007disallow deprecated- .ix
- PD101disallow- Series.nunique()for checking a series is constant
In addition, the following rules are only enforced for module code as they are inappropriate or too strict for unit tests and notebooks:
- DRequires docstrings on functions, classes and modules
- ANNRequires type annotations on functions and methods
- S101Disallows use of- assert
- PLR2004Disallows "magic" values in comparisons
- T20Disallows print statements
- EMLinting for error messages
- PLR0913Disallow too many arguments
- FBT003Disallow boolean positional values in call
- INP001Disallow implicit namespace packages
The target line length is 120 and the docstring convention is google.
pre-commit is a tool that runs checks on your files before you commit them with git, thereby helping ensure code quality. Enable it with the following command:
pre-commit install --install-hooksThe configuration is stored in .pre-commit-config.yaml.
nbwipers is a tool written in rust to ensure Jupyter notebooks are clean.
Committing notebooks that are not clean makes diffs more confusing, can degrade performance and increases the risk of leaking sensitive information.
You can set it up as a git filter with the following command.
nbwipers install localThe repository comes configured to use pytest for unit testing the module code.
Feel free to ignore it if you do not write module code.
You may optionally add a github workflow file which checks the following:
- uses ruff to check files are formatted and linted
- Runs unit tests and checks coverage
- Checks any markdown files are formatted with markdownlint-cli2
- Checks that all jupyter notebooks are clean
Typos checks for common typos in code, aiming for a low false positive rate. The repository is configured not to use it for Jupyter notebook files, as it tends to find errors in cell outputs.
Test with Copier and copier-template-tester.