Skip to content

Allow database dumps to be run against the read-only replica #2144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 5, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/bin/enqueue-job.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ fn main() -> Result<(), Error> {
match &*job {
"update_downloads" => Ok(tasks::update_downloads().enqueue(&conn)?),
"dump_db" => {
let database_url = args.next().unwrap_or_else(|| env("DATABASE_URL"));
let database_url = args.next().unwrap_or_else(|| env("READ_ONLY_REPLICA_URL"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed we are passing in the replica URL on the command line in production.

Could we add something like this to .env.sample, to make local testing easier?

export READ_ONLY_REPLICA_URL=$DATABASE_URL

(I'm not sure variable expansion works in that file.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure variable expansion works in that file.

I think that does work, but I don't recall for certain. In this case, I think I would prefer to remove the fallback entirely and add an error message suggesting to pass $DATABASE_URL via the shell.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

let target_name = args
.next()
.unwrap_or_else(|| String::from("db-dump.tar.gz"));
Expand Down
16 changes: 2 additions & 14 deletions src/tasks/dump_db/dump-export.sql.hbs
Original file line number Diff line number Diff line change
@@ -1,19 +1,7 @@
BEGIN;
BEGIN ISOLATION LEVEL REPEATABLE READ, READ ONLY;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This provides somewhat weaker consistency guarantees for the snapshot used for the database dump. The Postgres documentation explicitly recommends using SERIALIZABLE READ ONLY DEFERRABLE for backups.

We don't seem to have a very strong case for moving the dump to the read-only replica, since it doesn't seem to have cause any problems on the main replica. Neither does it seem unreasonable to do so, even with the weaker consistency guarantees.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more question – are we sure REPEATABLE READ will work on the hot spare? It isn't obvious why SERIALIZABLE READ ONLY would not work, so we should test this on stage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure REPEATABLE READ will work on the hot spare?

I definitely expect this to work. We can test by deploying the code, manually scheduling an export, verifying that everything works, and then finally updating the daily task.

It isn't obvious why SERIALIZABLE READ ONLY would not work

This definitely was a surprise to me as well. But I hope my other response helps explain why it isn't possible for the read-replica to be directly involved in cycle tracking on the leader. I've seen several suggestions that using DEFERABLE from a replica could be supported in future releases by writing some additional data in the WAL so that the follower can find a valid snapshot to start from. It appears no such enhancement has been implemented yet.

we should test this on stage.

We can't actually test this on staging, because Heroku only supports read-only replicas on paid database plans. But we shouldn't have any issues merging and deploying this change as we can roll it out via a configuration change to the scheduler.

We don't seem to have a very strong case for moving the dump to the read-only replica, since it doesn't seem to have cause any problems on the main replica.

I'm not entirely sure about that. I don't think any of the temporary response time spikes can be traced to the the export specifically, but I've definitely seen slight performance improvements and much less variation in response times after moving just a few expensive endpoints (#2073) to the read-only replica. In total, that PR seems to have moved only around 600 queries per hour off of the primary database, but the improvements seem impressive. It also appears to have eliminated the occasional spikes in download response times. I've collected some rough numbers and plan to follow up on that PR with more details, but I would definitely prefer to run the export from there as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm perfectly fine with moving to the read replica, and with the additional explanations you gave I'm actually actively in favour of doing so.

{{~#each tables}}
{{~#if this.filter}}
CREATE TEMPORARY VIEW "dump_db_{{this.name}}" AS (
SELECT {{this.columns}}
FROM "{{this.name}}"
WHERE {{this.filter}}
);
{{~/if}}
{{~/each}}
COMMIT;

BEGIN ISOLATION LEVEL SERIALIZABLE READ ONLY DEFERRABLE;
{{~#each tables}}
{{~#if this.filter}}
\copy (SELECT * FROM "dump_db_{{this.name}}") TO 'data/{{this.name}}.csv' WITH CSV HEADER
\copy (SELECT {{this.columns}} FROM "{{this.name}}" WHERE {{this.filter}}) TO 'data/{{this.name}}.csv' WITH CSV HEADER
{{~else}}
\copy "{{this.name}}" ({{this.columns}}) TO 'data/{{this.name}}.csv' WITH CSV HEADER
{{~/if}}
Expand Down
4 changes: 2 additions & 2 deletions src/tasks/dump_db/gen_scripts.rs
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ struct TableConfig {
#[derive(Debug, Serialize)]
struct HandlebarsTableContext<'a> {
name: &'a str,
filter: Option<&'a str>,
filter: Option<String>,
columns: String,
column_defaults: BTreeMap<&'a str, &'a str>,
}
Expand All @@ -59,7 +59,7 @@ impl TableConfig {
if columns.is_empty() {
None
} else {
let filter = self.filter.as_ref().map(String::as_str);
let filter = self.filter.as_ref().map(|s| s.replace('\n', " "));
let column_defaults = self
.column_defaults
.iter()
Expand Down