Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a unique constraint can fail under high load #485

Open
andrew-farries opened this issue Nov 25, 2024 · 0 comments
Open

Adding a unique constraint can fail under high load #485

andrew-farries opened this issue Nov 25, 2024 · 0 comments
Labels
bug Something isn't working
Milestone

Comments

@andrew-farries
Copy link
Collaborator

andrew-farries commented Nov 25, 2024

There is a race condition when adding a unique constraint to a column.

The race can cause the operation to fail, particularly when the target database is under high load at the time of the migration.

Reproduction 1

Create a pair of migrations:

01_create_table.json
{
  "name": "01_create_table",
  "operations": [
    {
      "create_table": {
        "name": "movies",
        "columns": [
          {
            "name": "id",
            "type": "serial",
            "pk": true
          },
          {
            "name": "name",
            "type": "varchar(255)"
          }
        ]
      }
    }
  ]
}
02_set_unique.json
{
  "name": "02_set_unique",
  "operations": [
    {
      "alter_column": {
        "table": "movies",
        "column": "name",
        "unique": {
          "name": "movies_name_unique"
        },
        "up": "name",
        "down": "name"
      }
    }
  ]
}

Apply the first migration:

pgroll start 01_create_table.json --complete

In the target database, generate some load on the system:

CREATE TABLE test(a int);
INSERT INTO test(a) SELECT generate_series(1, 10000000)

While the above INSERT operation is running, apply the second migration:

pgroll start 02_set_unique.json --complete

The second migration is likely to fail with the following error:

Failed to complete migration: unable to execute complete operation: pq: index "movies_name_unique" is not valid

Reproduction 2

The same problem also exists when a column with a UNIQUE constraint is duplicated, for example to change the nullability of the column. To reproduce:

Create a pair of migrations:

01_create_table.json
{
  "name": "01_create_table",
  "operations": [
    {
      "create_table": {
        "name": "movies",
        "columns": [
          {
            "name": "id",
            "type": "serial",
            "pk": true
          },
          {
            "name": "name",
            "type": "varchar(255)",
            "unique": true,
            "nullable": true
          }
        ]
      }
    }
  ]
}
02_set_notnull.json
{
  "name": "02_set_notnull",
  "operations": [
    {
      "alter_column": {
        "table": "movies",
        "column": "name",
        "nullable": false,
        "up": "name",
        "down": "name"
      }
    }
  ]
}

Now follow the same steps as for the first reproduction: apply the first migration, generate load, apply the second migration. The second migration will likely fail with the same error as in the first reproduction.

Notes

In the first reproduction, the operation fails because there is a race between the movies_name_unique index being marked valid after it is created (concurrently) on operation Start:

}
func (o *OpSetUnique) addUniqueIndex(ctx context.Context, conn db.DB) error {
// create unique index concurrently
_, err := conn.ExecContext(ctx, fmt.Sprintf("CREATE UNIQUE INDEX CONCURRENTLY IF NOT EXISTS %s ON %s (%s)",
pq.QuoteIdentifier(o.Name),
pq.QuoteIdentifier(o.Table),
pq.QuoteIdentifier(TemporaryName(o.Column))))
return err
}

and when the unique constraint is added using the index on operation Complete:

func (o *OpSetUnique) Complete(ctx context.Context, conn db.DB, tr SQLTransformer, s *schema.Schema) error {
// Create a unique constraint using the unique index
_, err := conn.ExecContext(ctx, fmt.Sprintf("ALTER TABLE IF EXISTS %s ADD CONSTRAINT %s UNIQUE USING INDEX %s",
pq.QuoteIdentifier(o.Table),
pq.QuoteIdentifier(o.Name),
pq.QuoteIdentifier(o.Name)))
if err != nil {
return err
}
return err
}

In the second reproduction, the race is between the concurrent index creation performed by the column duplicator:

func (d *duplicatorStmtBuilder) duplicateUniqueConstraints(withoutConstraint []string, colNames ...string) []string {
stmts := make([]string, 0, len(d.table.UniqueConstraints))
for _, uc := range d.table.UniqueConstraints {
if slices.Contains(withoutConstraint, uc.Name) {
continue
}
if duplicatedMember, constraintColumns := d.allConstraintColumns(uc.Columns, colNames...); duplicatedMember {
stmts = append(stmts, fmt.Sprintf(cCreateUniqueIndexSQL,
pq.QuoteIdentifier(DuplicationName(uc.Name)),
pq.QuoteIdentifier(d.table.Name),
strings.Join(quoteColumnNames(constraintColumns), ", "),
))
}
}
return stmts
}

And when the unique constraint is created using the index when the column is renamed on operation complete:

// Rename any `UNIQUE` indexes on the duplicated column and use them to
// create `UNIQUE` constraints.
for _, ui := range table.Indexes {
if !IsDuplicatedName(ui.Name) {
continue
}
if !ui.Unique {
continue
}
if slices.Contains(ui.Columns, TemporaryName(column.Name)) {
// Rename the unique index to its original name
renameIndexSQL := fmt.Sprintf(cRenameIndexSQL,
pq.QuoteIdentifier(ui.Name),
pq.QuoteIdentifier(StripDuplicationPrefix(ui.Name)),
)
_, err = conn.ExecContext(ctx, renameIndexSQL)
if err != nil {
return fmt.Errorf("failed to rename unique index %q: %w", ui.Name, err)
}
// Create a unique constraint using the unique index
createUniqueConstraintSQL := fmt.Sprintf(cCreateUniqueConstraintSQL,
pq.QuoteIdentifier(table.Name),
pq.QuoteIdentifier(StripDuplicationPrefix(ui.Name)),
pq.QuoteIdentifier(StripDuplicationPrefix(ui.Name)),
)
_, err = conn.ExecContext(ctx, createUniqueConstraintSQL)
if err != nil {
return fmt.Errorf("failed to create unique constraint from index %q: %w", ui.Name, err)
}
// Index no longer exists, remove it from the table
delete(table.Indexes, ui.Name)
}
}

In both cases the correct behaviour is to wait for the concurrently created index to be marked as valid before returning.

@andrew-farries andrew-farries added the bug Something isn't working label Nov 25, 2024
@andrew-farries andrew-farries added this to the v1 milestone Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant