Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get register of interest via API #194

Merged
merged 11 commits into from
Dec 9, 2024
Merged

Get register of interest via API #194

merged 11 commits into from
Dec 9, 2024

Conversation

ajparsons
Copy link
Contributor

Parliament's turned off their old site, so our scraper is broken!

This scopes out the old parser, adds a new parser that converts the Parliament API into the XML TheyWorkForYou expects.

As seen in dailyupdate, this can be run via:

poetry run python -m pyscraper.regmem download-all-registers --chamber commons

This updates some requirements in the poetry, and also uses this poetry config for the first time in production.

Rather than exactly replicating the previous structure, there's now a bit more use of lists to reflect how parliament sees items and subitems.

I've tested this in both the import to database:

image

And the comparison over time:

image

There's a boundary problem in that the sudden change will break the retrospect. If we delete and recreate the last few items though, this should be less of a problem.

Also have the option if reviewing this will take a while to just upload the generated files manually (given they seem fine).

@ajparsons ajparsons requested a review from dracos November 29, 2024 16:05
@ajparsons
Copy link
Contributor Author

twfy host now has poetry - so this should work even in advance of the twfy poetry pr.

Copy link
Member

@dracos dracos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue with missing jinja2. To be honest, I didn't follow all the code (perhaps some doctests of e.g. move_subitems_under_parent to show an example would be nice, not sure I understood the "parents" stuff), but if you're happy with the output, fine with me :)

pyscraper/regmem/__main__.py Show resolved Hide resolved
pyscraper/regmem/commons/api_models.py Show resolved Hide resolved
pyscraper/regmem/commons/process.py Outdated Show resolved Hide resolved
@ajparsons
Copy link
Contributor Author

Added above - added some more commentary on the subitem movements because that wasn't clear.

Added one extra fix that was needed for a march register to manually fetch a not present parent.

@ajparsons ajparsons requested a review from dracos December 9, 2024 14:14
Copy link
Member

@dracos dracos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ajparsons ajparsons merged commit c89e23c into master Dec 9, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants