-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
db-dump: set sequence values when importing a database dump #10204
base: main
Are you sure you want to change the base?
Conversation
By default, the import script recreates the database schema, which includes creating new sequences with zero values. This results in the lazy crates.io developer occasionally receiving obscure errors when inserting records into tables that use sequences, often not on the first or second insert due to IDs in the database dump not always being continuous. Rather than dumping the real sequence values from the database, we can just recreate them based on the maximum ID in each table. Works well enough, and means we don't have to tinker with the export script or ship extra data. This commit only configures the database tables that actually include data in the database dump. There are other sequences, but since those tables won't have data imported, it doesn't matter if they remain zero after import.
e279712
to
bd4780f
Compare
@@ -48,6 +48,9 @@ description = "public" | |||
crates_cnt = "public" | |||
created_at = "public" | |||
path = "public" | |||
[categories.sequence] | |||
column = "id" | |||
name = "categories_id_seq" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of manually declaring them here, would it be possible to derive them from the database schema in some way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is doable as follows, which is modified from https://stackoverflow.com/a/55414721:
select tbl.relname as table_name,
col.attname as column_name,
s.relname as sequence_name
from pg_class s
join pg_namespace sn on sn.oid = s.relnamespace
join pg_depend d on d.refobjid = s.oid and d.refclassid='pg_class'::regclass
join pg_attrdef ad on ad.oid = d.objid and d.classid = 'pg_attrdef'::regclass
join pg_attribute col on col.attrelid = ad.adrelid and col.attnum = ad.adnum
join pg_class tbl on tbl.oid = ad.adrelid
join pg_namespace ts on ts.oid = tbl.relnamespace
where s.relkind = 'S'
-- and s.relname = 'your_sequence_name_her'
and d.deptype in ('a', 'n');
From the result, you should see something similar to the following:
table_name | column_name | sequence_name
-----------------------+-------------+------------------------------
api_tokens | id | api_tokens_id_seq
background_jobs | id | background_jobs_id_seq
categories | id | categories_id_seq
crates | id | packages_id_seq
deleted_crates | id | deleted_crates_id_seq
dependencies | id | dependencies_id_seq
emails | id | emails_id_seq
keywords | id | keywords_id_seq
teams | id | teams_id_seq
users | id | users_id_seq
version_owner_actions | id | version_owner_actions_id_seq
versions | id | versions_id_seq
(12 rows)
1 | ||
) | ||
); | ||
{% endif %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs an update of the corresponding test snapshot :)
By default, the import script recreates the database schema, which includes creating new sequences with zero values. This results in the lazy crates.io developer occasionally receiving obscure errors when inserting records into tables that use sequences, often not on the first or second insert due to IDs in the database dump not always being continuous.
Rather than dumping the real sequence values from the database, we can just recreate them based on the maximum ID in each table. Works well enough, and means we don't have to tinker with the export script or ship extra data.
This commit only configures the database tables that actually include data in the database dump. There are other sequences, but since those tables won't have data imported, it doesn't matter if they remain zero after import.