-
Notifications
You must be signed in to change notification settings - Fork 9
Update or supplement rebuild-bigquery command for incremental updates #618
Description
As per a review comment, the rebuild-bigquery command should have the ability to incrementally update the BigQuery table with fields that only exist in the Postgres table, without having to copy the entire table over every time.
One potential fix is to pull down the list of primary keys from the BigQuery table as a filtering mechanism.
We're looking at 1,727,065 rows * 16 bytes (per md5 hash), or about 30 mb to do the initial filtering. It would make this function a bit more complicated. I imagine the proper way to do this is to do something like:
WITH bigquery_hashes as (...) SELECT * FROM build LEFT JOIN bigquery_hashes USING (build_hash) WHERE build_hash is NULLand then iterating over the results of this query to insert into the table. I'd prefer to leave as future work (i.e. out of scope of this PR), but this would be a good thing to have and schedule periodically.