db-dump: set sequence values when importing a database dump #10204

LawnGnome · 2024-12-13T23:02:38Z

By default, the import script recreates the database schema, which includes creating new sequences with zero values. This results in the lazy crates.io developer occasionally receiving obscure errors when inserting records into tables that use sequences, often not on the first or second insert due to IDs in the database dump not always being continuous.

Rather than dumping the real sequence values from the database, we can just recreate them based on the maximum ID in each table. Works well enough, and means we don't have to tinker with the export script or ship extra data.

This commit only configures the database tables that actually include data in the database dump. There are other sequences, but since those tables won't have data imported, it doesn't matter if they remain zero after import.

By default, the import script recreates the database schema, which includes creating new sequences with zero values. This results in the lazy crates.io developer occasionally receiving obscure errors when inserting records into tables that use sequences, often not on the first or second insert due to IDs in the database dump not always being continuous. Rather than dumping the real sequence values from the database, we can just recreate them based on the maximum ID in each table. Works well enough, and means we don't have to tinker with the export script or ship extra data. This commit only configures the database tables that actually include data in the database dump. There are other sequences, but since those tables won't have data imported, it doesn't matter if they remain zero after import.

Turbo87 · 2024-12-14T08:05:04Z

crates/crates_io_database_dump/src/dump-db.toml

@@ -48,6 +48,9 @@ description = "public"
 crates_cnt = "public"
 created_at = "public"
 path = "public"
+[categories.sequence]
+column = "id"
+name = "categories_id_seq"


instead of manually declaring them here, would it be possible to derive them from the database schema in some way?

This is doable as follows, which is modified from https://stackoverflow.com/a/55414721:

select tbl.relname as table_name, col.attname as column_name, s.relname as sequence_name from pg_class s join pg_namespace sn on sn.oid = s.relnamespace join pg_depend d on d.refobjid = s.oid and d.refclassid='pg_class'::regclass join pg_attrdef ad on ad.oid = d.objid and d.classid = 'pg_attrdef'::regclass join pg_attribute col on col.attrelid = ad.adrelid and col.attnum = ad.adnum join pg_class tbl on tbl.oid = ad.adrelid join pg_namespace ts on ts.oid = tbl.relnamespace where s.relkind = 'S' -- and s.relname = 'your_sequence_name_her' and d.deptype in ('a', 'n');

From the result, you should see something similar to the following:

table_name | column_name | sequence_name -----------------------+-------------+------------------------------ api_tokens | id | api_tokens_id_seq background_jobs | id | background_jobs_id_seq categories | id | categories_id_seq crates | id | packages_id_seq deleted_crates | id | deleted_crates_id_seq dependencies | id | dependencies_id_seq emails | id | emails_id_seq keywords | id | keywords_id_seq teams | id | teams_id_seq users | id | users_id_seq version_owner_actions | id | version_owner_actions_id_seq versions | id | versions_id_seq (12 rows)

Turbo87 · 2024-12-14T08:06:08Z

crates/crates_io_database_dump/src/dump-import.sql.j2

+            1
+        )
+    );
+{% endif %}


I think this needs an update of the corresponding test snapshot :)

LawnGnome added A-infrastructure 📡 C-internal 🔧 Category: Nonessential work that would make the codebase more consistent or clear labels Dec 13, 2024

LawnGnome requested a review from a team December 13, 2024 23:02

LawnGnome force-pushed the db-dump-sequences branch from e279712 to bd4780f Compare December 13, 2024 23:08

Turbo87 reviewed Dec 14, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

db-dump: set sequence values when importing a database dump #10204

db-dump: set sequence values when importing a database dump #10204

Uh oh!

LawnGnome commented Dec 13, 2024

Uh oh!

Turbo87 Dec 14, 2024

Uh oh!

eth3lbert Dec 16, 2024

Uh oh!

Turbo87 Dec 14, 2024

Uh oh!

Uh oh!

db-dump: set sequence values when importing a database dump #10204

Are you sure you want to change the base?

db-dump: set sequence values when importing a database dump #10204

Uh oh!

Conversation

LawnGnome commented Dec 13, 2024

Uh oh!

Turbo87 Dec 14, 2024

Choose a reason for hiding this comment

Uh oh!

eth3lbert Dec 16, 2024

Choose a reason for hiding this comment

Uh oh!

Turbo87 Dec 14, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!