Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db-dump: set sequence values when importing a database dump #10204

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

LawnGnome
Copy link
Contributor

By default, the import script recreates the database schema, which includes creating new sequences with zero values. This results in the lazy crates.io developer occasionally receiving obscure errors when inserting records into tables that use sequences, often not on the first or second insert due to IDs in the database dump not always being continuous.

Rather than dumping the real sequence values from the database, we can just recreate them based on the maximum ID in each table. Works well enough, and means we don't have to tinker with the export script or ship extra data.

This commit only configures the database tables that actually include data in the database dump. There are other sequences, but since those tables won't have data imported, it doesn't matter if they remain zero after import.

@LawnGnome LawnGnome added A-infrastructure 📡 C-internal 🔧 Category: Nonessential work that would make the codebase more consistent or clear labels Dec 13, 2024
@LawnGnome LawnGnome requested a review from a team December 13, 2024 23:02
By default, the import script recreates the database schema, which
includes creating new sequences with zero values. This results in the
lazy crates.io developer occasionally receiving obscure errors when
inserting records into tables that use sequences, often not on the first
or second insert due to IDs in the database dump not always being
continuous.

Rather than dumping the real sequence values from the database, we can
just recreate them based on the maximum ID in each table. Works well
enough, and means we don't have to tinker with the export script or ship
extra data.

This commit only configures the database tables that actually include
data in the database dump. There are other sequences, but since those
tables won't have data imported, it doesn't matter if they remain zero
after import.
@@ -48,6 +48,9 @@ description = "public"
crates_cnt = "public"
created_at = "public"
path = "public"
[categories.sequence]
column = "id"
name = "categories_id_seq"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of manually declaring them here, would it be possible to derive them from the database schema in some way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is doable as follows, which is modified from https://stackoverflow.com/a/55414721:

select tbl.relname as table_name, 
       col.attname as column_name,
       s.relname as sequence_name
from pg_class s
  join pg_namespace sn on sn.oid = s.relnamespace 
  join pg_depend d on d.refobjid = s.oid and d.refclassid='pg_class'::regclass 
  join pg_attrdef ad on ad.oid = d.objid and d.classid = 'pg_attrdef'::regclass
  join pg_attribute col on col.attrelid = ad.adrelid and col.attnum = ad.adnum
  join pg_class tbl on tbl.oid = ad.adrelid 
  join pg_namespace ts on ts.oid = tbl.relnamespace 
where s.relkind = 'S'
--  and s.relname = 'your_sequence_name_her'
  and d.deptype in ('a', 'n');

From the result, you should see something similar to the following:

      table_name       | column_name |        sequence_name         
-----------------------+-------------+------------------------------
 api_tokens            | id          | api_tokens_id_seq
 background_jobs       | id          | background_jobs_id_seq
 categories            | id          | categories_id_seq
 crates                | id          | packages_id_seq
 deleted_crates        | id          | deleted_crates_id_seq
 dependencies          | id          | dependencies_id_seq
 emails                | id          | emails_id_seq
 keywords              | id          | keywords_id_seq
 teams                 | id          | teams_id_seq
 users                 | id          | users_id_seq
 version_owner_actions | id          | version_owner_actions_id_seq
 versions              | id          | versions_id_seq
(12 rows)

1
)
);
{% endif %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs an update of the corresponding test snapshot :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-infrastructure 📡 C-internal 🔧 Category: Nonessential work that would make the codebase more consistent or clear
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants