Description
Is your feature request related to a problem? Please describe.
Twice now we've been in a position where moving to a new version has caused issues and a downgrade has been necessary:
- sqlmigrations: v20.1.0: out of bounds panic #48786
- release-20.2: sql: failure to upgrade FK representation during table validation produces spurious errors and makes table unavailable #57032
In the first instance we unfortunately did not set preserve_downgrade_option
(though I'm not 100% it would have helped in this instance). With the second issue we had set it, so managed to avoid any big catastrophe... though from the issue you can someone else wasn't quite so lucky: #57032 (comment)
The v20.2 upgrade documentation specifically states:
we recommend disabling auto-finalization so you can monitor the stability and performance of the upgraded cluster before finalizing the upgrade
This suggests that most people should be trying to avoid auto-finalization.
Describe the solution you'd like
My question is: is it sensible to default to auto-finalization, given some of the issues that pop up and the recommendations in your own documentation? I'm not actively watching issues here, so I don't have reasonable perspective on how often people get into a tangle as a result of this. I did, however, feel it was worth raising the question.
I fully understand requiring operators to take manual steps during upgrades is undesirable. I think it's worth weighing that up with fairly rapidly locking into a new version.
Describe alternatives you've considered
- Having a flexible downgrade path regardless of what has happened would be an option, though I suspect considerable effort
- Delay auto-finalization based on some default duration, perhaps considerable. This could be with some option to force finalization, when required.
- There is also the option to have a cluster-wide setting to disable auto-finalization permanently and allowing operators to manually finalize the upgrade (i.e. as I understand this was the old behaviour). If there is a general preferance to retain auto-finalization, then this can be defaulted to "off".
Jira issue: CRDB-3471