-
Notifications
You must be signed in to change notification settings - Fork 26
feat: smart staged rollouts for Android #763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
d0c8013
b75289f
277b0f8
1dd4084
4051a5c
a9225ee
5dd40c8
ffb12bf
cb413fe
30881f2
3ef5a60
a396c6c
de624cf
bf0f7da
7ff678c
6c26854
84da096
d4d70a0
f055bf8
87fe9d9
6b62771
d649922
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| class HaltUnhealthyReleaseRolloutJob < ApplicationJob | ||
| queue_as :high | ||
|
|
||
| def perform(production_release_id, event_id) | ||
| production_release = ProductionRelease.find(production_release_id) | ||
| event = ReleaseHealthEvent.find(event_id) | ||
|
|
||
| if production_release.unhealthy? | ||
| store_rollout = production_release.store_rollout | ||
|
|
||
| if store_rollout.is_a?(PlayStoreRollout) && store_rollout.automatic_rollout? | ||
| result = Action.halt_the_store_rollout!(store_rollout) | ||
|
|
||
| # Mark the event as action triggered if halt was successful | ||
| event.update(action_triggered: true) if result.ok? | ||
| end | ||
| end | ||
| end | ||
| end | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| class IncreaseHealthyReleaseRolloutJob < ApplicationJob | ||
| queue_as :high | ||
|
|
||
| def perform(play_store_rollout_id) | ||
| rollout = PlayStoreRollout.find(play_store_rollout_id) | ||
| return if rollout.completed? || rollout.fully_released? | ||
| return unless rollout.automatic_rollout? | ||
|
|
||
| release = rollout.parent_release | ||
|
|
||
| if release.healthy? | ||
| if rollout.halted? | ||
| Action.resume_the_store_rollout!(rollout) | ||
| end | ||
|
|
||
| if rollout.started? | ||
| Action.increase_the_store_rollout!(rollout) | ||
| end | ||
| end | ||
|
|
||
| # This update is necessary so that our verify rollout job does not pick this up again | ||
| rollout.update!( | ||
| automatic_rollout_updated_at: Time.current, | ||
| automatic_rollout_next_update_at: Time.current + PlayStoreRollout::AUTO_ROLLOUT_RUN_INTERVAL | ||
| ) | ||
|
|
||
| IncreaseHealthyReleaseRolloutJob.perform_in(PlayStoreRollout::AUTO_ROLLOUT_RUN_INTERVAL, play_store_rollout_id) | ||
| end | ||
| end |
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. problem This job can cause double/multi increments to the rollout. Scenario:
there are multiple problems here:
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| class VerifyAutomaticRolloutJob < ApplicationJob | ||
| queue_as :high | ||
|
|
||
| def perform | ||
| PlayStoreRollout.automatic_rollouts | ||
| .where(automatic_rollout_next_update_at: ...Time.current) | ||
| .where( | ||
| "automatic_rollout_next_update_at - automatic_rollout_updated_at >= interval '300 second'" | ||
| ).find_each do |rollout| | ||
| IncreaseHealthyReleaseRolloutJob.perform_async(rollout.id) | ||
| end | ||
| end | ||
| end |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -258,10 +258,15 @@ def self.resume_the_store_rollout!(rollout) | |
| Res.new { true } | ||
| end | ||
|
|
||
| def self.halt_the_store_rollout!(rollout) | ||
| def self.halt_the_store_rollout!(rollout, manually: false) | ||
| return Res.new { raise "release is not actionable" } unless rollout.actionable? | ||
| return Res.new { raise "rollout is not started" } unless rollout.started? | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. problem: our state machine allows moving from halted back to started (so we can increase rollout) but this guard clause will prevent it from actually happening and blow up we need to remove this I think |
||
| rollout.halt_release! | ||
|
|
||
| if manually && rollout.automatic_rollout? | ||
| rollout.disable_automatic_rollout! | ||
| end | ||
|
|
||
| return Res.new { raise rollout.errors.full_messages.to_sentence } if rollout.errors? | ||
| Res.new { true } | ||
| end | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,17 +2,20 @@ | |
| # | ||
| # Table name: store_rollouts | ||
| # | ||
| # id :uuid not null, primary key | ||
| # completed_at :datetime | ||
| # config :decimal(8, 5) default([]), not null, is an Array | ||
| # current_stage :integer | ||
| # is_staged_rollout :boolean default(FALSE) | ||
| # status :string not null | ||
| # type :string not null | ||
| # created_at :datetime not null | ||
| # updated_at :datetime not null | ||
| # release_platform_run_id :uuid not null, indexed | ||
| # store_submission_id :uuid indexed | ||
| # id :uuid not null, primary key | ||
| # automatic_rollout :boolean default(FALSE), not null | ||
| # automatic_rollout_next_update_at :datetime | ||
| # automatic_rollout_updated_at :datetime | ||
| # completed_at :datetime | ||
| # config :decimal(8, 5) default([]), not null, is an Array | ||
| # current_stage :integer | ||
| # is_staged_rollout :boolean default(FALSE) | ||
| # status :string not null | ||
| # type :string not null | ||
| # created_at :datetime not null | ||
| # updated_at :datetime not null | ||
| # release_platform_run_id :uuid not null, indexed | ||
| # store_submission_id :uuid indexed | ||
| # | ||
| class PlayStoreRollout < StoreRollout | ||
| include Passportable | ||
|
|
@@ -23,20 +26,32 @@ class PlayStoreRollout < StoreRollout | |
| STAMPABLE_REASONS = %w[ | ||
| started | ||
| updated | ||
| paused | ||
| resumed | ||
| halted | ||
| completed | ||
| failed | ||
| fully_released | ||
| ] | ||
|
|
||
| AUTO_ROLLOUT_RUN_INTERVAL = 24.hours | ||
|
|
||
| aasm safe_state_machine_params(with_lock: false) do | ||
| state :created, initial: true | ||
| state(*STATES.keys) | ||
|
|
||
| event :start, after_commit: :on_start! do | ||
| transitions from: :created, to: :started | ||
| transitions from: :halted, to: :started | ||
| transitions from: :paused, to: :started | ||
| end | ||
|
|
||
| event :pause do | ||
| transitions from: :started, to: :paused | ||
| end | ||
|
|
||
| event :resume do | ||
| transitions from: :paused, to: :started | ||
| end | ||
|
|
||
| event :halt, after_commit: :on_halt! do | ||
|
|
@@ -56,8 +71,6 @@ class PlayStoreRollout < StoreRollout | |
|
|
||
| def controllable_rollout? = true | ||
|
|
||
| def automatic_rollout? = false | ||
|
|
||
| def start_release!(retry_on_review_fail: false) | ||
| if staged_rollout? | ||
| # return mock_start_play_store_rollout! if sandbox_mode? | ||
|
|
@@ -128,13 +141,20 @@ def halt_release! | |
| end | ||
| end | ||
|
|
||
| def disable_automatic_rollout! | ||
| update!(automatic_rollout: false) | ||
| end | ||
|
|
||
| def resume_release! | ||
| with_lock do | ||
| return unless may_start? | ||
|
|
||
| result = rollout(last_rollout_percentage, retry_on_review_fail: true) | ||
| if result.ok? | ||
| start! | ||
| if release_platform_run.automatic_rollout? | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. problem: I think this why you probably added the long dig to find the config in we can just do play_store_submission.conf.automatic_rollout?and then delegate can be chucked delegate :automatic_rollout?, to: :release_platform_run |
||
| update!(automatic_rollout: true) | ||
| end | ||
| event_stamp!(reason: :resumed, kind: :notice, data: stamp_data) | ||
| notify!("Rollout was resumed", :production_rollout_resumed, notification_params) | ||
| else | ||
|
|
@@ -143,8 +163,19 @@ def resume_release! | |
| end | ||
| end | ||
|
|
||
| def rollout_active? | ||
| provider.build_active?(submission_channel_id, build_number, raise_on_lock_error: true) | ||
| def pause_release! | ||
| with_lock do | ||
| return unless may_pause? | ||
|
|
||
| disable_automatic_rollout! | ||
| pause! | ||
| event_stamp!(reason: :paused, kind: :error, data: stamp_data) | ||
| notify!("Automatic rollout has been paused", :production_rollout_paused, notification_params) | ||
| end | ||
| end | ||
|
|
||
| def rollout_in_progress? | ||
| provider.build_in_progress?(submission_channel_id, build_number, raise_on_lock_error: true) | ||
| rescue GooglePlayStoreIntegration::LockAcquisitionError => e | ||
| errors.add(:base, e.message) | ||
| false | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
Repository: tramlinehq/tramline
Length of output: 233
🏁 Script executed:
Repository: tramlinehq/tramline
Length of output: 1701
🏁 Script executed:
Repository: tramlinehq/tramline
Length of output: 233
🏁 Script executed:
Repository: tramlinehq/tramline
Length of output: 861
🏁 Script executed:
Repository: tramlinehq/tramline
Length of output: 832
🏁 Script executed:
Repository: tramlinehq/tramline
Length of output: 1054
🏁 Script executed:
Repository: tramlinehq/tramline
Length of output: 45
🏁 Script executed:
Repository: tramlinehq/tramline
Length of output: 1057
🏁 Script executed:
Repository: tramlinehq/tramline
Length of output: 1102
🏁 Script executed:
Repository: tramlinehq/tramline
Length of output: 339
Define
Release#healthy?method or use inverse predicate inIncreaseHealthyReleaseRolloutJob.The
Releasemodel lacks ahealthy?method, butIncreaseHealthyReleaseRolloutJobcallsrelease.healthy?on line 11. TheReleasemodel only definesunhealthy?(which checks if any platform run is unhealthy). AddRelease#healthy?as the inverse ofunhealthy?, or refactor the job to use!release.unhealthy?for consistency with the predicate pattern used elsewhere.🤖 Prompt for AI Agents