Important
This is an experiment. It is not ready for production.
crit is a tool for auditing GOV.UK Tech Docs style technical documentation websites.
It helps technical writers and content designers identify internal contradictions in the documentation.
It might be useful for cleaning up corpuses for RAG.
crit will
- scrape websites built using the tech docs template
- chunk the documents
- use an LLM to compare each chunk with every other chunk, producing a list of contradictory statements
poetry install
Populate .env
with values suitable for configuring an ordinary AzureOpenAI
connection in langchain.
poetry run python -m crit.extract https://docs.whatever.service.gov.uk/ # or another tech docs site
This will output the timestamped filename of a JSON file containing the contents of the website, e.g. docs.whatever.service.gov.uk/content-12345.json
poetry run python -m crit.compare docs.whatever.service.gov.uk/content-12345.json
This will output the name of an HTML file.
To view the file it's recommended to run a web server in public/
cd public
python -m http.server
Then visit http://localhost:8000
and your output reports will be listed.
