Skip to content

Is upgrade auto-finalization a good default? #57887

Open
@nick-jones

Description

@nick-jones

Is your feature request related to a problem? Please describe.

Twice now we've been in a position where moving to a new version has caused issues and a downgrade has been necessary:

In the first instance we unfortunately did not set preserve_downgrade_option (though I'm not 100% it would have helped in this instance). With the second issue we had set it, so managed to avoid any big catastrophe... though from the issue you can someone else wasn't quite so lucky: #57032 (comment)

The v20.2 upgrade documentation specifically states:

we recommend disabling auto-finalization so you can monitor the stability and performance of the upgraded cluster before finalizing the upgrade

This suggests that most people should be trying to avoid auto-finalization.

Describe the solution you'd like

My question is: is it sensible to default to auto-finalization, given some of the issues that pop up and the recommendations in your own documentation? I'm not actively watching issues here, so I don't have reasonable perspective on how often people get into a tangle as a result of this. I did, however, feel it was worth raising the question.

I fully understand requiring operators to take manual steps during upgrades is undesirable. I think it's worth weighing that up with fairly rapidly locking into a new version.

Describe alternatives you've considered

  • Having a flexible downgrade path regardless of what has happened would be an option, though I suspect considerable effort
  • Delay auto-finalization based on some default duration, perhaps considerable. This could be with some option to force finalization, when required.
  • There is also the option to have a cluster-wide setting to disable auto-finalization permanently and allowing operators to manually finalize the upgrade (i.e. as I understand this was the old behaviour). If there is a general preferance to retain auto-finalization, then this can be defaulted to "off".

Jira issue: CRDB-3471

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-cluster-upgradesC-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)O-communityOriginated from the communityT-server-and-securityDB Server & SecurityX-blathers-triagedblathers was able to find an owner

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions