Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds new scrapers and a common data format for registers of interests across the devolved Parliaments (+ London partially).
Don't need this reviewed until there's a companion PR in twfy to test the data structure import.
The goal is to import registers of interests as json files with a more complex structure to move formatting into the TWFY template.
Here we add a GenericRegmem pydantic data structure that all the different scrapers work with, can dump to json - and this will then be stored in the twfy database (current approach stores raw html).
All scrapers create this json, and the equivalent XML as this will continue to be how the comparison over time is produced. There is a one-off conversion script for old XML to new-style json for consistency on display of old MP data.
The readme.MD has more information on the different scrapers (most of which use APIs). London is currently incomplete because we can't guarantee the equivalent TWFY IDs. Included at this point to test the flexibility of the format.