DataStore PG Dump #191

JVickery-TBS · 2025-02-11T19:10:14Z

Adds a .sql format option to datastore dump endpoint which streams back the contents of a customized pg_dump on the table. Is this useful at all? I have no idea, but the idea was to give the users who wanted to use the datastore_search_sql an option to get the SQL table and they can query locally.

- pg_dump sql format for datastore dump.

# Conflicts: # ckanext/datastore/blueprint.py ### RESOLVED.

- Datastore pg_dump endpoint.

- Add change log file.

ckanext/datastore/blueprint.py

wardi · 2025-02-14T19:25:16Z

This is an interesting idea, if someone has a local postgres it would make it easier for them to work with the data with the correct data types and everything.

Just curious, do the column comments (the data dictionary) also get exported by dumping this way?

- Use subprocess.run for timeouts. - Added max execution for sql dump config. - Made sql dump pluggable. - Used max buffer sizes for subprocess and byte chunks.

JVickery-TBS · 2025-02-24T16:00:53Z

@wardi okay I have done all the above, changes to subprocess.run to be able to handle timeouts and proper killing of the subprocess. And made it so the dump sql is pluggable.

I am currently having some issues in our setup with full text search stuff with giant DS tables, so just fixing that up and will see if the column comments get exported or not.

JVickery-TBS · 2025-02-24T16:36:52Z

@wardi yup! the comments from Data Dicitonary do in fact get exported:

COMMENT ON COLUMN "public"."0690bcb7-42a6-40b4-9ab2-bf4ca4f4ebb3"."ref_number" IS '{"_info":{"label_en":"Reference Number","label_fr":"Reference Number fr","notes_en":"Reference Number des","notes_fr":"Reference Number des fr","type_override":""}}';
COMMENT ON COLUMN "public"."0690bcb7-42a6-40b4-9ab2-bf4ca4f4ebb3"."amendment_number" IS '{"_info":{"label_en":"Amendment Number","label_fr":"Amendment Number fr","notes_en":"Amendment Number des","notes_fr":"Amendment Number des fr","type_override":""}}';
COMMENT ON COLUMN "public"."0690bcb7-42a6-40b4-9ab2-bf4ca4f4ebb3"."amendment_date" IS '{"_info":{"label_en":"","label_fr":"","notes_en":"","notes_fr":"","type_override":""}}';
COMMENT ON COLUMN "public"."0690bcb7-42a6-40b4-9ab2-bf4ca4f4ebb3"."agreement_type" IS '{"_info":{"label_en":"","label_fr":"","notes_en":"","notes_fr":"","type_override":""}}';
COMMENT ON COLUMN "public"."0690bcb7-42a6-40b4-9ab2-bf4ca4f4ebb3"."recipient_type" IS '{"_info":{"label_en":"","label_fr":"","notes_en":"","notes_fr":"","type_override":""}}';

ckanext/datastore/backend/postgres.py

wardi · 2025-03-15T19:19:35Z

May not be accepted upstream. There's no psql cli tools in the default ckan docker container and there might be problems with opening too many connections to the DB if we're using external tools to make new connections as a result of a web request.

JVickery-TBS · 2025-03-17T14:54:08Z

@wardi yeah figured I would not put this upstream. Will show this to Data and Biz team this Friday and see what they think.

JVickery-TBS added 3 commits February 7, 2025 20:36

feat(views): ds dump sql;

d3ae470

- pg_dump sql format for datastore dump.

Merge branch 'canada-v2.10' into feature/ds-dump-sql

61ef49f

# Conflicts: # ckanext/datastore/blueprint.py ### RESOLVED.

feat(dev): pg dump;

7a20797

- Datastore pg_dump endpoint.

JVickery-TBS added the enhancement label Feb 11, 2025

JVickery-TBS requested a review from wardi February 11, 2025 19:10

JVickery-TBS assigned wardi and JVickery-TBS Feb 11, 2025

feat(misc): changelog;

29a58a7

- Add change log file.