Skip to content

Commit 223697d

Browse files
Add migration+change tests for blind_index update
The update to the blind_index gem requires a number of changes no matter what. This commit switches the application to the newer blind_index naming conventions and algorithm (argon2id), requiring a migration to do it and changes to the test fixture for user data. This commit was more work than expected. Calculating the bidx values doesn't work correctly from the fixtures within the ERB (the calls *run* but don't produce the right values, presumably because it's not fully set up at that point). This was resolved by inserting the calculated values into the fixtures, which also makes the tests more independent of the system being tested. One of the older test values was subtly wrong (.com vs. .org email address), which created a mysterious hard-to-track failure (we prevent future versions of this problem via a new test that sanity-checks all test values). The migration had to be tweaked several times to work with real production data. We need to not change the user updated_at by setting touch to false (the user-visible data isn't changing), we need not validate (some entries may not be acceptable any more but we still need to migrate them as they are), and we need to skip cases where encrypted_email is nil (in those cases we have no data to decrypt). Signed-off-by: David A. Wheeler <[email protected]>
1 parent 92a5d28 commit 223697d

File tree

5 files changed

+130
-47
lines changed

5 files changed

+130
-47
lines changed

app/models/user.rb

+4-3
Original file line numberDiff line numberDiff line change
@@ -41,13 +41,14 @@ class User < ApplicationRecord
4141
attr_encrypted :email, algorithm: 'aes-256-gcm', key: [(
4242
ENV['EMAIL_ENCRYPTION_KEY'] || '1' * DIGITS_OF_EMAIL_ENCRYPTION_KEY
4343
)].pack('H*')
44+
4445
# Email addresses are indexed as blind indexes of downcased email addresses,
4546
# so we can efficiently search for them while keeping them encrypted.
46-
# Usage: User.where(email: "[email protected]")
47-
# or: User.where(email: "[email protected]", provider: "local")
47+
# Usage: User.where(email: '[email protected]')
48+
# or: User.where(email: '[email protected]', provider: 'local')
4849
blind_index :email, key: [(
4950
ENV['EMAIL_BLIND_INDEX_KEY'] || '2' * DIGITS_OF_EMAIL_BLIND_INDEX_KEY
50-
)].pack('H*'), expression: ->(v) { v.try(:downcase) }, legacy: true
51+
)].pack('H*'), expression: ->(v) { v.try(:downcase) }
5152

5253
scope :created_since, (
5354
lambda do |time|
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# frozen_string_literal: true
2+
3+
# Migrate to the use of gem blind_index version 2.2, which
4+
# has somewhat different conventions than its old version.
5+
# In particular, its default column names have changed.
6+
# We're changing the cryptographic hash algorithm
7+
# (to its new default argon2id),
8+
# so there's no need to keep the old hash values.
9+
# Instead, we destroy the old columns & blind indexes,
10+
# create the new ones, and backfill the data (a forced recalculation).
11+
# The backfill takes some time, but it's a one-time operation.
12+
#
13+
# To see how we used to do this, see:
14+
# db/migrate/20180525145445_add_encrypted_email_to_users.rb
15+
# db/migrate/20161102170815_change_email_column_type.rb
16+
17+
class BlindIndex22 < ActiveRecord::Migration[6.1]
18+
def change
19+
rename_column :users, :encrypted_email_bidx, :email_bidx
20+
21+
# This rename is done automatically by rename_column:
22+
# rename_index :users, 'index_users_on_encrypted_email_bidx',
23+
# 'index_users_on_email_bidx'
24+
25+
rename_index :users, 'encrypted_email_local_unique_bidx',
26+
'email_local_unique_bidx'
27+
28+
# Backfill (recompute) the blind index. This is relatively quick;
29+
# one test took 1m13s for 8999 users.
30+
reversible do |dir|
31+
# Update the blind index value "by hand", as
32+
# this doesn't update it properly: BlindIndex.backfill(User)
33+
User.find_each(batch_size: 400) do |user|
34+
# puts(user.id)
35+
if user.email.present?
36+
user.email_bidx = User.generate_email_bidx(user.email)
37+
# Do NOT change the changed timestamp, as this encryption change
38+
# is an internal issue. Also, disable validations (e.g., blank names)
39+
# since while that's not desirable we need to do the transition
40+
# for all records, even less-than-perfect ones.
41+
user.save!(touch: false, validate: false)
42+
end
43+
end
44+
end
45+
end
46+
end

db/schema.rb

+6-6
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,15 @@
22
# of editing this file, please use the migrations feature of Active Record to
33
# incrementally modify your database, and then regenerate this schema definition.
44
#
5-
# This file is the source Rails uses to define your schema when running `rails
6-
# db:schema:load`. When creating a new database, `rails db:schema:load` tends to
5+
# This file is the source Rails uses to define your schema when running `bin/rails
6+
# db:schema:load`. When creating a new database, `bin/rails db:schema:load` tends to
77
# be faster and is potentially less error prone than running all of your
88
# migrations from scratch. Old migrations may fail to apply correctly if those
99
# migrations use external dependencies or application code.
1010
#
1111
# It's strongly recommended that you check this file into your version control system.
1212

13-
ActiveRecord::Schema.define(version: 2020_12_22_014804) do
13+
ActiveRecord::Schema.define(version: 2021_02_16_184638) do
1414

1515
# These are extensions that must be enabled in order to support this database
1616
enable_extension "citext"
@@ -417,11 +417,11 @@
417417
t.datetime "last_login_at"
418418
t.string "encrypted_email"
419419
t.string "encrypted_email_iv"
420-
t.string "encrypted_email_bidx"
420+
t.string "email_bidx"
421421
t.boolean "use_gravatar", default: false, null: false
422422
t.datetime "can_login_starting_at"
423-
t.index ["encrypted_email_bidx"], name: "encrypted_email_local_unique_bidx", unique: true, where: "((provider)::text = 'local'::text)"
424-
t.index ["encrypted_email_bidx"], name: "index_users_on_encrypted_email_bidx"
423+
t.index ["email_bidx"], name: "email_local_unique_bidx", unique: true, where: "((provider)::text = 'local'::text)"
424+
t.index ["email_bidx"], name: "index_users_on_email_bidx"
425425
t.index ["last_login_at"], name: "index_users_on_last_login_at"
426426
t.index ["uid"], name: "index_users_on_uid"
427427
end

test/fixtures/users.yml

+48-37
Original file line numberDiff line numberDiff line change
@@ -5,24 +5,38 @@
55
# on the general principle that we should vary test values;
66
# they don't have any particular meaning. Make sure they match the tests!
77
#
8-
# WARNING: The encrypted_email_bidx *MUST* end in newline, because that
9-
# is the format created (and required) by the gem blind_index.
10-
# The encrypted_email also ends in newline, but that's just because the
8+
# Note: The encrypted_email ends in newline, but that's just because the
119
# gem attr_encrypted does that (it's not actually required).
10+
# At one time the encrypted_email_bidx *MUST* end in newline, because that
11+
# is the format created (and required) by the gem blind_index, but that is
12+
# no longer true.
1213
#
13-
# We encrypt user email addresses. For test purposes, we used a fixed
14+
# User email addresses are encrypted in the database, so for proper
15+
# testing we must store them as encrypted data.
16+
# For test purposes, we used a fixed
1417
# encryption IV (don't do this with real data), and provide precomputed
15-
# values for "encrypted_email" and "encrypted_email_bidx". There's
16-
# lot of infrastructure that has to work to compute these values, and they
17-
# take time to compute, so providing precomputed values makes sense.
18-
# To find values for new email addresses, run "rails console" and then:
18+
# values for "encrypted_email" and "email_bidx".
19+
# We *must* do this for email_bidx, at least, because the fixture code is
20+
# loaded and run as an ERB before its keys are fully set up.
21+
# That's okay; there's lot of infrastructure
22+
# that has to work correctly to compute these values,
23+
# and they take time to compute, so providing their fixtures as
24+
# known-correct precomputed values makes sense.
25+
# (It means that tests are more independent of the system under test, and
26+
# thus failures are more likely to help point out the actual problems.)
27+
#
28+
# To find values for new email addresses, run "rails console"
29+
# (make sure to *not* set 'EMAIL_ENCRYPTION_KEY' nor 'EMAIL_BLIND_INDEX_KEY')
30+
# and then do something like this:
31+
1932
# encrypted_email_iv: "G4fbeClWL3aFMY9I"
20-
# encrypted_email:
21-
# User.encrypt_email('[email protected]',
22-
# iv: Base64.decode64('G4fbeClWL3aFMY9I'))
23-
# encrypted_email_bidx:
33+
# encrypted_email: # Find its value with:
34+
# User.encrypt_email(email, iv: Base64.decode64('G4fbeClWL3aFMY9I'))
35+
# email_bidx: # Find its value with:
36+
# User.generate_email_bidx(email)
37+
# Another (longer) way to compute email_bidx is to run:
2438
# BlindIndex.generate_bidx("[email protected]", User.blind_indexes[:email])
25-
# An alternative approach is to create a stub user:
39+
# You can create a stub user and find its email_bidx this way:
2640
# x = User.new; x.email = '[email protected]'; x.compute_email_bidx
2741
# This assumes that you're using the usual (default) test keys.
2842
#
@@ -32,41 +46,32 @@
3246
# iv: Base64.decode64('G4fbeClWL3aFMY9I')).inspect)
3347
# puts(' encrypted_email_bidx: ' + BlindIndex.generate_bidx(
3448
# current_email, User.blind_indexes[:email]).inspect)
35-
#
36-
# Here are some additional encryptions if you need them:
37-
# current_email = '[email protected]'
38-
# encrypted_email_iv: "G4fbeClWL3aFMY9I"
39-
# encrypted_email: "0LWldZP1iahMHmhaQwjU9g27hpaNo56iCfMNh+ZsERg=\n"
40-
# encrypted_email_bidx: "C/0XJyW8Xitx19MPv1j8B65iNOaykMLZnMZ/TctNp3U=\n"
41-
#
42-
# current_email = '[email protected]'
43-
# encrypted_email_iv: "G4fbeClWL3aFMY9I"
44-
# encrypted_email: "27ulebbkl7xNLmFHDArW/ZzHIFjaYveE2hxJl2Kxdk2/rm4xXw==\n"
45-
# encrypted_email_bidx: "0SHo5Lc+XTrcY6HvouDwAIvEdy+8odILw9DP3MGXMfM=\n"
46-
#
47-
# current_email = '[email protected]'
48-
# encrypted_email: "3LC6d73QlLFAA3RTCEnJ455qaUF28kYI7ZOGoIDUhkzm\n"
49-
# encrypted_email_iv: "G4fbeClWL3aFMY9I"
50-
# encrypted_email_bidx: "yspd41bZJvT9bn0MQXU2l2dnhpZTIKpf0u46w6+qf1U=\n"
5149

5250
# current_email = '[email protected]'
5351
test_user:
5452
name: Test
53+
# This is our conventional IV value
5554
encrypted_email_iv: "G4fbeClWL3aFMY9I"
55+
# encrypted_email: (%= User.encrypt_email('[email protected]',
56+
# iv: Base64.decode64('G4fbeClWL3aFMY9I')).inspect %)
5657
encrypted_email: "ybGkapP1iahMHmhaQwjU9qTfGB7w/KCFTiUGkIVZZ08=\n"
57-
encrypted_email_bidx: "hEghi+qNCA63qDAuTIq8jBWtOweoQ33EstwMnvM3/gk=\n"
58+
# email_bidx: (%= User.generate_email_bidx('[email protected]').inspect %)
59+
email_bidx: "vmUX8sb4/akruZRPDwQ7e2bZHiAcqxO2BS7u2vdg/Cs="
5860
password_digest: <%= User.digest('password') %>
5961
provider: local
6062
preferred_locale: en
6163
activated: true
6264
activated_at: <%= Time.zone.now %>
6365

6466
# current_email = '[email protected]'
67+
# User.encrypt_email('[email protected]',
68+
# iv: Base64.decode64('G4fbeClWL3aFMY9I'))
6569
test_user_melissa:
6670
name: Melissa Lewis
6771
encrypted_email_iv: "G4fbeClWL3aFMY9I"
6872
encrypted_email: "0LG7d6DjkIlEFmVSHQvDv5qGImt+oNOl9LLibPvLAwHeWmo=\n"
69-
encrypted_email_bidx: "ip6E/EM+EYkbzsRm6ZTprRg4PTsN/T5xY9E80U3Hlfo=\n"
73+
# email_bidx: (%= User.generate_email_bidx('[email protected]').inspect %)
74+
email_bidx: "gkyruOK5LMxyHFElzYO9YtdKJBrIJZnkDamwU6M1+Jc="
7075
password_digest: <%= User.digest('password1') %>
7176
provider: local
7277
preferred_locale: en
@@ -78,7 +83,8 @@ test_user_mark:
7883
name: Mark Watney
7984
encrypted_email_iv: "G4fbeClWL3aFMY9I"
8085
encrypted_email: "0LWldZP1iahMHmhaQwTJ/JxzMJME5stBTXCm7WcmYyg=\n"
81-
encrypted_email_bidx: "aZpkg4oEr8KjacLrZINGN1JRGD1caIWoWUTM57KOePk=\n"
86+
# email_bidx: (%= User.generate_email_bidx('[email protected]').inspect %)
87+
email_bidx: "PM4RbQebE5i3FU4Zm/+cNT6P0POakuR8Fz9QzP7L01U="
8288
password_digest: <%= User.digest('password') %>
8389
provider: local
8490
preferred_locale: en
@@ -90,7 +96,8 @@ test_user_not_active:
9096
name: Forgetful
9197
encrypted_email_iv: "G4fbeClWL3aFMY9I"
9298
encrypted_email: "27ulebbkl7xNLmFHDArW/ZzHLEXQQHbOqCyJ9yfVGoZn2ok7ew==\n"
93-
encrypted_email_bidx: "nRtJqO5uutJ7Z/HcjLHM/jt3cA++pN1t85URKscQfsQ=\n"
99+
# email_bidx: (%= User.generate_email_bidx('[email protected]').inspect %)
100+
email_bidx: "y5UiQko516f50w/YRAZAuW7FzzpWKmvg1Yzd44ry+JQ="
94101
password_digest: <%= User.digest('password') %>
95102
provider: local
96103
preferred_locale: en
@@ -101,7 +108,8 @@ test_user_cs:
101108
name: Case Sensitive
102109
encrypted_email_iv: "G4fbeClWL3aFMY9I"
103110
encrypted_email: "/rWke4D1n7pIGm1JCCfD6ZiEP0bY51bxymIp4AvOLZGyoX8ln1EALDc=\n"
104-
encrypted_email_bidx: "+rRkCLpeIVcfwtPu+Jy6YyXtksI/w3frgdSu8J4kR2M=\n"
111+
# email_bidx: (%= User.generate_email_bidx('[email protected]'.downcase).inspect %)
112+
email_bidx: "I5hCRmgHRw0+XSK6TMfTRk/vhkbX+2cXUJzaiWTVqcs="
105113
password_digest: <%= User.digest('password') %>
106114
provider: local
107115
preferred_locale: en
@@ -113,7 +121,8 @@ admin_user:
113121
name: Admin
114122
encrypted_email_iv: "G4fbeClWL3aFMY9I"
115123
encrypted_email: "3LC6d73QlLFAA3RTCEnF/pQPuTWBDzfZWpDX1TJ+bsBO\n"
116-
encrypted_email_bidx: "PH39U8dihphvLTlLEwJWXdy2OLz477WOn/DNsZaHqfE=\n"
124+
# email_bidx: (%= User.generate_email_bidx('[email protected]').inspect %)
125+
email_bidx: "pdszbZsBfJRl+Fg0ZZ7/Mwguy64JlXvJ7TflzSEIawM="
117126
password_digest: <%= User.digest('password') %>
118127
provider: local
119128
preferred_locale: en
@@ -128,7 +137,8 @@ github_user:
128137
nickname: github-user
129138
encrypted_email_iv: "G4fbeClWL3aFMY9I"
130139
encrypted_email: "2r2jdqby3LxSC3Z/CB/H/ImFKgTeplS8IDY+qfw8Fi73KhLn8MYC\n"
131-
encrypted_email_bidx: "Vyiu7EdbDXioCgMAVIWqLHyBH148S9JcSspSWq6nPN8=\n"
140+
# email_bidx: (%= User.generate_email_bidx('[email protected]').inspect %)
141+
email_bidx: "ZjENXwF1sT4lGPhlXJMGGojv/NgxQvRgis647Pgotwg="
132142
password_digest: <%= User.digest('password') %>
133143
provider: github
134144
preferred_locale: en
@@ -140,7 +150,8 @@ fr_user:
140150
name: French Test
141151
encrypted_email: "26b6arbjhYlEFmVSHQvDv5abKOzHp6ZN4HPtuqnxJkbq84I=\n"
142152
encrypted_email_iv: "G4fbeClWL3aFMY9I"
143-
encrypted_email_bidx: "7e3Bzzf0T1HnQKICsDEqJGosonRFiZZNtN445wgNDYU=\n"
153+
# email_bidx: (%= User.generate_email_bidx('[email protected]').inspect %)
154+
email_bidx: "iZlt4P3IjjuMUuajvkGqwatHglhDxoNCUsYMgebOEOQ="
144155
password_digest: <%= User.digest('password') %>
145156
provider: local
146157
preferred_locale: fr

test/models/user_test.rb

+26-1
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ class UserTest < ActiveSupport::TestCase
107107
new_blind_index = BlindIndex.generate_bidx( # Recalc so test's less fragile
108108
email_address, User.blind_indexes[:email]
109109
)
110-
assert new_blind_index, user1.encrypted_email_bidx
110+
assert new_blind_index, user1.email_bidx
111111
end
112112

113113
test 'associated projects should be destroyed' do
@@ -152,5 +152,30 @@ def email
152152
u = StubUserEmail.new
153153
assert_equal 'CANNOT_DECRYPT', u.email_if_decryptable
154154
end
155+
156+
test 'Data model encrypted email addresses and blind index keys work' do
157+
# We precompute the user data fixtures, and it's possible we got it wrong.
158+
# Walk through the data set to do sanity checks for each value.
159+
User.find_each do |user|
160+
# puts(user.name)
161+
assert user.name.present?, "Empty name for #{user.id}"
162+
assert user.encrypted_email.present?, "Email not present for #{user.name}"
163+
assert(
164+
user.encrypted_email_iv.present?,
165+
"Email IV not present for #{user.name}"
166+
)
167+
# This will also fail if the email is not encrypted correctly:
168+
assert user.email.present?, "Email not present for #{user.name}"
169+
assert user.provider.present?, "Provider not present for #{user.name}"
170+
assert(
171+
(user.provider != 'local' || user.password_digest.present?),
172+
"Local user has no password: #{user.name}, #{user.email}"
173+
)
174+
# An incorrect bidx could lead to confusing test results, so we
175+
# *definitely* want the following check.
176+
# Check that bidx was computed correctly:
177+
assert_equal User.generate_email_bidx(user.email), user.email_bidx
178+
end
179+
end
155180
end
156181
# rubocop:enable Metrics/ClassLength

0 commit comments

Comments
 (0)