-
Notifications
You must be signed in to change notification settings - Fork 649
Allow database dumps to be run against the read-only replica #2144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
7d084d3
d8b8235
a5d69d9
4ae90e6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,19 +1,7 @@ | ||
BEGIN; | ||
BEGIN ISOLATION LEVEL REPEATABLE READ, READ ONLY; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This provides somewhat weaker consistency guarantees for the snapshot used for the database dump. The Postgres documentation explicitly recommends using We don't seem to have a very strong case for moving the dump to the read-only replica, since it doesn't seem to have cause any problems on the main replica. Neither does it seem unreasonable to do so, even with the weaker consistency guarantees. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One more question – are we sure There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I definitely expect this to work. We can test by deploying the code, manually scheduling an export, verifying that everything works, and then finally updating the daily task.
This definitely was a surprise to me as well. But I hope my other response helps explain why it isn't possible for the read-replica to be directly involved in cycle tracking on the leader. I've seen several suggestions that using
We can't actually test this on staging, because Heroku only supports read-only replicas on paid database plans. But we shouldn't have any issues merging and deploying this change as we can roll it out via a configuration change to the scheduler.
I'm not entirely sure about that. I don't think any of the temporary response time spikes can be traced to the the export specifically, but I've definitely seen slight performance improvements and much less variation in response times after moving just a few expensive endpoints (#2073) to the read-only replica. In total, that PR seems to have moved only around 600 queries per hour off of the primary database, but the improvements seem impressive. It also appears to have eliminated the occasional spikes in download response times. I've collected some rough numbers and plan to follow up on that PR with more details, but I would definitely prefer to run the export from there as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm perfectly fine with moving to the read replica, and with the additional explanations you gave I'm actually actively in favour of doing so. |
||
{{~#each tables}} | ||
{{~#if this.filter}} | ||
CREATE TEMPORARY VIEW "dump_db_{{this.name}}" AS ( | ||
SELECT {{this.columns}} | ||
FROM "{{this.name}}" | ||
WHERE {{this.filter}} | ||
); | ||
{{~/if}} | ||
{{~/each}} | ||
COMMIT; | ||
|
||
BEGIN ISOLATION LEVEL SERIALIZABLE READ ONLY DEFERRABLE; | ||
{{~#each tables}} | ||
{{~#if this.filter}} | ||
\copy (SELECT * FROM "dump_db_{{this.name}}") TO 'data/{{this.name}}.csv' WITH CSV HEADER | ||
\copy (SELECT {{this.columns}} FROM "{{this.name}}" WHERE {{this.filter}}) TO 'data/{{this.name}}.csv' WITH CSV HEADER | ||
{{~else}} | ||
\copy "{{this.name}}" ({{this.columns}}) TO 'data/{{this.name}}.csv' WITH CSV HEADER | ||
{{~/if}} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed we are passing in the replica URL on the command line in production.
Could we add something like this to
.env.sample
, to make local testing easier?(I'm not sure variable expansion works in that file.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that does work, but I don't recall for certain. In this case, I think I would prefer to remove the fallback entirely and add an error message suggesting to pass
$DATABASE_URL
via the shell.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.