We welcome any assistance from the community in maintaining and building this project. It's impossible for the maintainers to keep up with all opt-out methods, so we're open for new Rules to be added and suggested by the community. Of course, we're also open to assistance in tracking down issues, resolving bugs, and improving simplicity.
Working together, we can make this project a standard in the ML community, and help to make sure all consent is respected throughout AI orgs around the world.
Welcome to datadiligence contributor's guide.
This document focuses on getting any potential contributor familiarized with the development processes, but we're also open to other forms of contribution (docs, tests, issues, etc).
If you are new to using git or have never collaborated in a project previously, please have a look at contribution-guide.org. Other resources are also listed in the excellent guide created by FreeCodeCamp [1].
Please notice, all users and contributors are expected to be open, considerate, reasonable, and respectful. When in doubt, Python Software Foundation's Code of Conduct is a good reference in terms of behavior guidelines.
If you experience bugs or general issues with datadiligence, please have a look
on the issue tracker. If you don't see anything useful there, please feel
free to fire an issue report.
Tip
Please don't forget to include the closed issues in your search. Sometimes a solution was already reported, and the problem is considered solved.
New issue reports should include information about your programming environment (e.g., operating system, Python version) and steps to reproduce the problem. Please try also to simplify the reproduction steps to a very minimal example that still illustrates the problem you are facing. By removing other factors, you help us to identify the root cause of the issue.
You can help improve datadiligence docs by making them more readable and coherent, or
by adding missing information and correcting mistakes.
datadiligence documentation uses Sphinx and reStructuredText as its main documentation compiler.
This means that the docs are kept in the same repository as the project code, and
that any documentation update is done in the same way was a code contribution.
Tip
Please notice that the GitHub web interface provides a quick way of propose changes in
datadiligence's files. While this mechanism can be tricky for normal code contributions, it works perfectly fine for contributing to the docs, and can be quite handy.If you are interested in trying this method out, please navigate to the
docsfolder in the source repository, find which file you would like to propose changes and click in the little pencil icon at the top, to open GitHub's code editor. Once you finish editing the file, please write a message in the form at the bottom of the page describing which changes have you made and what are the motivations behind them and submit your proposal.
When working on documentation changes in your local machine, you can
compile them using tox:
tox -e docs
and use Python's built-in web server for a preview in your web browser
(http://localhost:8000):
python3 -m http.server --directory 'docs/_build/html'
In general, most contributions should be as either new Evaluators or new Rules. Other, structural changes can be made for optimization or ease-of-use, but should be discussed. Additionally, any impact or inconvenience to the user should be avoided. We want to increase adoption, and any barriers to entry would harm that goal.
Before you work on any non-trivial code contribution it's best to first create a report in the issue tracker to start a discussion on the subject. This often provides additional considerations and avoids unnecessary work.
Before you start coding, we recommend creating an isolated virtual
environment to avoid any problems with your installed Python packages.
This can easily be done via either virtualenv:
virtualenv <PATH TO VENV> source <PATH TO VENV>/bin/activate
Create an user account on GitHub if you do not already have one.
Fork the project repository: click on the Fork button near the top of the page. This creates a copy of the code under your account on GitHub.
Clone this copy to your local disk:
git clone git@github.com:YourLogin/datadiligence.git cd datadiligence
You should run:
pip install -U pip setuptools -e .
to be able to import the package under development in the Python REPL.
Install
toxwithpip install toxorpipx.Install the development dependencies with:
tox -e dev
This will install all the dependencies needed to run the tests and build the documentation.
Create a branch to hold your changes:
git checkout -b my-feature
and start making changes. Never work on the main branch!
Start your work on this branch. Don't forget to add docstrings to new functions, modules and classes, especially if they are part of public APIs.
Add yourself to the list of contributors in
AUTHORS.rst.When you’re done editing, do:
git add <MODIFIED FILES> git commit
to record your changes in git.
Important
Don't forget to add unit tests and documentation in case your contribution adds an additional feature and is not just a bugfix.
Moreover, writing a descriptive commit message is highly recommended. In case of doubt, you can check the commit history with:
git log --graph --decorate --pretty=oneline --abbrev-commit --all
to look for recurring communication patterns.
Please check that your changes don't break any unit tests with:
tox
(after having installed
toxwithpip install toxorpipx).You can also use
toxto run several other pre-configured tasks in the repository. Trytox -avto see a list of the available checks.
If everything works fine, push your local branch to GitHub with:
git push -u origin my-feature
Go to the web page of your fork and click "Create pull request" to send your changes for review.
Find more detailed information in creating a PR. You might also want to open the PR as a draft first and mark it as ready for review after the feedbacks from the continuous integration (CI) system or any required fixes.
The following tips can be used when facing problems to build or test the package:
Make sure to fetch all the tags from the upstream repository. The command
git describe --abbrev=0 --tagsshould return the version you are expecting. If you are trying to run CI scripts in a fork repository, make sure to push all the tags. You can also try to remove all the egg files or the complete egg folder, i.e.,.eggs, as well as the*.egg-infofolders in thesrcfolder or potentially in the root of your project.Sometimes
toxmisses out when new dependencies are added, especially tosetup.cfganddocs/requirements.txt. If you find any problems with missing dependencies when running a command withtox, try to recreate thetoxenvironment using the-rflag. For example, instead of:tox -e docs
Try running:
tox -r -e docs
Make sure to have a reliable
toxinstallation that uses the correct Python version (e.g., 3.7+). When in doubt you can run:tox --version # OR which tox
If you have trouble and are seeing weird errors upon running
tox, you can also try to create a dedicated virtual environment with atoxbinary freshly installed. For example:virtualenv .venv source .venv/bin/activate .venv/bin/pip install tox .venv/bin/tox -e all
Pytest can drop you in an interactive session in the case an error occurs. In order to do that you need to pass a
--pdboption (for example by runningtox -- -k <NAME OF THE FALLING TEST> --pdb). You can also setup breakpoints manually instead of using the--pdboption.
If you are part of the group of maintainers and have correct user permissions
on PyPI, the following steps can be used to release a new version for
datadiligence:
- Make sure all unit tests are successful.
- Tag the current commit on the main branch with a release tag, e.g.,
git tag 0.1.7. - Push the new tag to the upstream repository, e.g.,
git push origin 0.1.7 - Clean up the
distandbuildfolders withtox -e clean(orrm -rf dist build) to avoid confusion with old builds and Sphinx docs. - Run
tox -e buildand check that the files indisthave the correct version (no.dirtyor git hash) according to the git tag. Also check the sizes of the distributions, if they are too big (e.g., > 500KB), unwanted clutter may have been accidentally included. - Run
tox -e publish -- --repository pypiand check that everything was uploaded to PyPI correctly. This will prompt you for pypi authentication. We only use PyPi API Keys, so the user will be__token__and the password is value of the API key. Optionally, you can set theTWINE_PASSWORDenvironment variable to the value of the API key.
| [1] | Even though, these resources focus on open source projects and communities, the general ideas behind collaborating with other developers to collectively create software are general and can be applied to all sorts of environments, including private companies and proprietary code bases. |