-
Notifications
You must be signed in to change notification settings - Fork 26
feat: smart staged rollouts for Android #763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
35437c6 to
f753a9b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one thing to make clear in the notice is when one doesn't have any rules configured, unhealthy returns false, which is the correct behavior, but we should let users know that the rollout has no way of halting automatically in that case and it's a bit of a runaway train.
I'll check back later once you're fully done with this.
kitallis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I need to do another pass, to ensure the behavior is correct.
a04e5b7 to
87fe9d9
Compare
kitallis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should pair on this once you're done with #794 because it requires some careful redoing.
app/models/store_submission.rb
Outdated
| def auto_rollout? | ||
| if parent_release.production? && staged_rollout? && conf.automatic_rollout? | ||
| true | ||
| elsif !parent_release.production? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this elsif is incorrect, when it's non-production, it should always be false since we don't support automatic rollouts anywhere other than prod. There's probably only two-cases here, the truthy one otherwise the falsey one.
| Action.halt_the_store_rollout!(store_rollout) | ||
| store_rollout.update!( | ||
| automatic_rollout_updated_at: Time.current, | ||
| automatic_rollout_next_update_at: Time.current + PlayStoreRollout::AUTO_ROLLOUT_RUN_INTERVAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
problem: I think this is incorrect, because it will reschedule the next rollout at the point of the halt. Within a 24 hour window, a halt could happen very late, which means the next rollout can potentially get delayed if the release becomes healthy.
for example:
day1 / 11pm: rollout stage 1
day1 / 6pm: release unhealthy: halted
with the above logic the next rollout will happen on day 2 / 6pm. instead this should happen:
day1 / 11pm: rollout stage 1
day1 / 6pm: release unhealthy: halted
day1 / 7pm: release healthy
day2 / 11pm: rollout stage 2
I think when hallting updating automatic_rollout_next_update_at is not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
problem This job can cause double/multi increments to the rollout. Scenario:
- cascading job from
IncreaseHealthyReleaseRolloutJobscheduled for day2/11am - the job gets delayed by 1 minute (because of queue buildup)
- Verify kicks in because
automatic_rollout_updated_atwas at day1/11:00am andautomatic_rollout_next_update_atis at day2/11:00am (so diff is > 300 seconds) - it triggers the increase rollout job
- the delayed job from point 2 also kicks in
- increments ends up happening twice in 2 minutes
there are multiple problems here:
IncreaseHealthyReleaseRolloutJobis not fully idempotent because it only looks at time, I think it also needs to look at the "stage" of the rollout- the verification check of 300 seconds should not look at the diff between next and last, it should probably just do "delta" verification which is: was I supposed to run at t0, but I haven't run for t0 + <large jitter like 1 hour> because its purpose should be integrity check rather than another way of updating rollout
| def self.halt_the_store_rollout!(rollout) | ||
| def self.halt_the_store_rollout!(rollout, manually: false) | ||
| return Res.new { raise "release is not actionable" } unless rollout.actionable? | ||
| return Res.new { raise "rollout is not started" } unless rollout.started? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
problem: our state machine allows moving from halted back to started (so we can increase rollout) but this guard clause will prevent it from actually happening and blow up
we need to remove this I think
config/locales/passport/en.yml
Outdated
| resumed_html: 'Resumed the rollout of <span class="emphasize">%{version} (%{build_number})</span> to Play Store <span class="emphasize">%{track}</span> track at <span class="emphasize">%{rollout_percentage}%</span>' | ||
| completed_html: 'Completed the rollout of <span class="emphasize">%{version} (%{build_number})</span> to Play Store <span class="emphasize">%{track}</span> track at <span class="emphasize">%{rollout_percentage}%</span>' | ||
| fully_released_html: 'Fully released <span class="emphasize">%{version} (%{build_number})</span> to Play Store <span class="emphasize">%{track}</span> track from <span class="emphasize">%{rollout_percentage}%</span>' | ||
| paused_html: 'Paused the rollout of <span class="emphasize">%{version} (%{build_number})</span> to App Store at phase <span class="emphasize">%{current_stage} (%{rollout_percentage}%)</span>' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
problem There's a copy-paste error here it says "App Store" instead of "Play Store"
| result = rollout(last_rollout_percentage, retry_on_review_fail: true) | ||
| if result.ok? | ||
| start! | ||
| if release_platform_run.automatic_rollout? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
problem: I think this why you probably added the long dig to find the config in release_platform_run but it's not necessary.
we can just do
play_store_submission.conf.automatic_rollout?and then delegate can be chucked
delegate :automatic_rollout?, to: :release_platform_run
app/models/production_release.rb
Outdated
| release_data = monitoring_provider.find_release(platform, version_name, build_number, store_rollout.created_at) | ||
| return if release_data.blank? | ||
| release_health_metrics.create!(fetched_at: Time.current, **release_data) | ||
| HaltUnhealthyReleaseRolloutJob.perform_async(id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: ReleaseHealthEvent has this thing
after_create_commit :notify_health_rule_triggeredwhich notifies, I think similarly we can inline halting in ReleaseHealthEvent or move it to a Signal instead.
but more importantly, we should also mark this debug-flag we'd added called ReleaseHealthEvent#action_triggered which we should mark as true once we kickoff the job, so that there's a log of the fact that a particular health event caused an action to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
additionally, we need to do the reverse as well. when a release health event makes the release healthy, we should "resume" the release from halted
currently we just wait for the next rollout to kick in (which is good) but halting "stops" the adoption of the app, so when it's healthy it should restart the adoption of the app as well (before it increments) if possible.
WalkthroughThe pull request implements an automatic rollout feature for app releases. It adds database columns to track automatic rollout state and timing on Sequence Diagram(s)sequenceDiagram
participant Scheduler as Scheduler<br/>(every 15min)
participant VAJob as VerifyAutomatic<br/>RolloutJob
participant IHRJob as IncreaseHealthy<br/>ReleaseRolloutJob
participant PSR as PlayStoreRollout<br/>(Automatic)
participant PR as ProductionRelease
participant RHE as ReleaseHealthEvent
participant HUJob as HaltUnhealthy<br/>ReleaseRolloutJob
rect rgb(245, 245, 250)
note over Scheduler: Scheduled every 15 minutes
Scheduler->>VAJob: perform()
end
rect rgb(240, 250, 240)
note over VAJob: Find rollouts: automatic_rollout=true,<br/>status in [started, halted],<br/>ready for update (timing check)
VAJob->>PSR: Query automatic_rollouts<br/>with timing conditions
VAJob->>IHRJob: enqueue(play_store_rollout_id)
end
rect rgb(250, 245, 240)
note over IHRJob: Process enrolled rollout<br/>and health status
IHRJob->>PSR: Load rollout
IHRJob->>PR: Check if healthy?
alt Healthy Release
IHRJob->>PSR: Resume if halted
IHRJob->>PSR: Increase stage
IHRJob->>PSR: Update automatic_rollout<br/>timing fields
IHRJob->>IHRJob: Schedule next update<br/>(24 hours)
else Unhealthy Release
note over IHRJob: Unhealthy: skip increase,<br/>still reschedule
IHRJob->>IHRJob: Schedule next update<br/>(24 hours)
end
end
rect rgb(250, 235, 235)
note over RHE: Unhealthy event created
RHE->>HUJob: trigger_halt_on_unhealthy<br/>(after_create_commit)
HUJob->>PSR: Check if automatic?
alt Automatic Rollout Enabled
HUJob->>PSR: halt_the_store_rollout!<br/>(manually: true)
HUJob->>PSR: disable_automatic_rollout!
HUJob->>RHE: Update action_triggered=true
end
end
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
app/components/live_release/prod_release/rollout_component.rb (1)
33-37: Duplicate delegation:automatic_rollout?is delegated to bothstore_rolloutandrelease_platform_run.Line 33 delegates
automatic_rollout?tostore_rollout, and line 37 delegates the same method torelease_platform_run. The second delegation will override the first, which may not be the intended behavior.Please clarify which delegation is correct. If both sources need to be consulted, consider:
- Removing the duplicate and keeping only one delegation
- Or renaming one of the methods to avoid the conflict
- Or implementing a custom method that checks both sources if needed
# If only store_rollout delegation is needed: delegate :automatic_rollout?, to: :store_rollout # OR if only release_platform_run delegation is needed: delegate :automatic_rollout?, to: :release_platform_run # OR if both need to be checked, implement a custom method: def automatic_rollout? store_rollout.automatic_rollout? && release_platform_run.automatic_rollout? endapp/models/app_store_rollout.rb (1)
71-71: Inconsistent automatic rollout behavior: hardcodedtrueconflicts with no-opdisable_automatic_rollout!Line 71 hardcodes
automatic_rollout?to always returntrue, but thedisable_automatic_rollout!method (lines 130-132) is a no-op. This creates an inconsistency where automatic rollout can never be disabled for App Store rollouts, even though the method exists.Additionally, the schema shows an
automatic_rolloutcolumn (line 6) that should likely be used instead of hardcoding the value.Consider one of these approaches:
Option 1: Use the database column (recommended if App Store supports automatic rollout):
- def automatic_rollout? = true + def automatic_rollout? = automatic_rolloutAnd implement
disable_automatic_rollout!:def disable_automatic_rollout! - # noop for app store + update!(automatic_rollout: false) endOption 2: Remove the feature for App Store (if App Store doesn't support it):
- def automatic_rollout? = true + def automatic_rollout? = falseThis makes it clear that App Store rollouts don't support automatic control.
♻️ Duplicate comments (7)
config/locales/passport/en.yml (1)
82-82: Fix the copy-paste error: should say "Play Store" not "App Store".This translation entry is under the
play_store_rollout:section but incorrectly references "App Store". This will confuse users when they see pause notifications for Play Store rollouts.Apply this diff:
- paused_html: 'Paused the rollout of <span class="emphasize">%{version} (%{build_number})</span> to App Store at phase <span class="emphasize">%{current_stage} (%{rollout_percentage}%)</span>' + paused_html: 'Paused the rollout of <span class="emphasize">%{version} (%{build_number})</span> to Play Store at phase <span class="emphasize">%{current_stage} (%{rollout_percentage}%)</span>'app/models/production_release.rb (2)
148-148: Missing: Resume rollout when release becomes healthy.The current implementation only enqueues
HaltUnhealthyReleaseRolloutJobto halt unhealthy releases, but there's no corresponding logic to resume the rollout when a release becomes healthy again. When a rollout is halted, it stops user adoption. If the release becomes healthy, the rollout should be automatically resumed to restart adoption before incrementing.Consider enqueuing a resume job when the release is healthy:
if unhealthy? HaltUnhealthyReleaseRolloutJob.perform_async(id) else ResumeHealthyReleaseRolloutJob.perform_async(id) end
148-148: Missing: Markaction_triggeredflag when automatic action is taken.When
HaltUnhealthyReleaseRolloutJobis enqueued, the system should markReleaseHealthEvent#action_triggeredastrueto maintain an audit trail showing that a particular health event triggered an automatic action. This provides visibility into which health events caused automatic rollout controls.app/jobs/verify_automatic_rollout_job.rb (1)
4-11: Known issue: Race condition can cause double-increments.The 300-second verification window (line 8) can lead to double rollout increments when scheduled jobs are delayed, as previously identified. The verification should use a larger jitter window (e.g., 1 hour) for integrity checks rather than serving as another update trigger.
Based on past review feedback, consider:
- Making
IncreaseHealthyReleaseRolloutJobfully idempotent by checking the current rollout stage in addition to time- Increasing the verification window to ~1 hour to serve as an integrity check rather than a primary update mechanism
app/jobs/halt_unhealthy_release_rollout_job.rb (1)
12-15: The timing logic forautomatic_rollout_next_update_aton halt was previously flagged.A past review comment noted that updating
automatic_rollout_next_update_atduring a halt could delay the next rollout unnecessarily. If a release becomes healthy again soon after being halted, the next rollout would still be delayed by the full interval from the halt time rather than from the original schedule.app/models/store_submission.rb (1)
72-80: The logic for non-production releases was previously flagged as potentially incorrect.A past review comment indicated that automatic rollouts are only supported for production releases, so
auto_rollout?should returnfalsefor non-production releases. However, the current implementation returnstruewhen!parent_release.production?(lines 75-76).If automatic rollouts are indeed only for production, the method could be simplified:
def auto_rollout? - if parent_release.production? && staged_rollout? && conf.automatic_rollout? - true - elsif !parent_release.production? - true - else - false - end + return false unless parent_release.production? + staged_rollout? && conf.automatic_rollout? endapp/models/play_store_rollout.rb (1)
155-157: A past review comment suggested usingplay_store_submission.conf.automatic_rollout?instead of delegation.The current code uses
release_platform_run.automatic_rollout?to determine whether to re-enable automatic rollout on resume. A previous review suggested usingplay_store_submission.conf.automatic_rollout?directly instead of delegating throughrelease_platform_run.
🧹 Nitpick comments (7)
app/models/app_store_rollout.rb (1)
130-132: Document whydisable_automatic_rollout!is a no-op for App Store.If automatic rollout is not applicable to App Store (due to Apple's phased release being fully automated), please add a comment explaining this design decision.
def disable_automatic_rollout! - # noop for app store + # No-op: App Store phased releases are automatically managed by Apple + # and cannot be manually controlled via the automatic rollout system enddb/migrate/20250331140348_add_automatic_rollout_columns.rb (1)
6-6: Consider adding null constraint for consistency.Line 6 adds
automatic_rollouttosubmission_configswithdefault: falsebut withoutnull: false, unlike thestore_rollouts.automatic_rolloutcolumn on line 3. This creates a three-state boolean (true/false/null) which can lead to ambiguous logic.Apply this diff to add the null constraint:
- add_column :submission_configs, :automatic_rollout, :boolean, default: false + add_column :submission_configs, :automatic_rollout, :boolean, null: false, default: falseapp/jobs/increase_healthy_release_rollout_job.rb (1)
16-21: Consider: Timestamps updated regardless of increase success.Lines 16-19 update
automatic_rollout_updated_atandautomatic_rollout_next_update_ateven if the rollout increase on line 12 failed or was skipped (when the release is not healthy). This means the job reschedules itself (line 21) without confirming progress was made.If the intent is to keep retrying until successful, consider only updating timestamps and rescheduling when the increase action succeeds:
if release.healthy? && (rollout.started? || rollout.halted?) - Action.increase_the_store_rollout!(rollout) + result = Action.increase_the_store_rollout!(rollout) + return unless result.ok? end # This update is necessary so that our verify rollout job does not pick this up again rollout.update!( automatic_rollout_updated_at: Time.current, automatic_rollout_next_update_at: Time.current + PlayStoreRollout::AUTO_ROLLOUT_RUN_INTERVAL ) IncreaseHealthyReleaseRolloutJob.perform_in(PlayStoreRollout::AUTO_ROLLOUT_RUN_INTERVAL, play_store_rollout_id)This ensures the job only advances the schedule when actual progress is made.
spec/jobs/halt_unhealthy_release_rollout_job_spec.rb (1)
26-35: Consider adding assertions for timestamp updates after halting.The job updates
automatic_rollout_updated_atandautomatic_rollout_next_update_atafter halting. Consider adding assertions to verify these values are set correctly, especially given the past review discussion about whether updatingautomatic_rollout_next_update_aton halt is the correct behavior.spec/jobs/verify_automatic_rollout_job_spec.rb (1)
41-50: Consider adding a test case for the 300-second minimum interval guard.The job implementation has a check for
automatic_rollout_next_update_at - automatic_rollout_updated_at >= interval '300 second'. Consider adding a specific test case where the interval is less than 300 seconds to verify this guard prevents enqueuing.db/schema.rb (1)
920-923: Consider adding an index onautomatic_rollout_next_update_atfor query performance.The
VerifyAutomaticRolloutJobqueriesstore_rolloutsfiltering byautomatic_rollout_next_update_at. As the table grows, this query could benefit from an index:WHERE automatic_rollout_next_update_at < NOW() AND automatic_rollout_next_update_at - automatic_rollout_updated_at >= interval '300 second'spec/jobs/increase_healthy_release_rollout_job_spec.rb (1)
69-76: Missing test for halted + healthy scenario.Based on the job implementation, when a rollout is
haltedbut the release ishealthy?, it should triggerAction.increase_the_store_rollout!. This test only covers the halted state without mockinghealthy?, which meanshealthy?likely returnsfalseby default.Consider adding a test case:
context "when a staged play store rollout is halted and release is healthy" do let(:store_rollout) { create(:store_rollout, :play_store, :halted, is_staged_rollout: true, automatic_rollout: true, store_submission: play_store_submission) } before do allow_any_instance_of(ProductionRelease).to receive(:healthy?).and_return(true) end it "rolls out the release to next stage" do described_class.new.perform(store_rollout.id) expect(play_store_integration).to have_received(:rollout_release) end end
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (25)
app/components/live_release/prod_release/rollout_component.rb(2 hunks)app/controllers/config/release_platforms_controller.rb(2 hunks)app/controllers/store_rollouts_controller.rb(1 hunks)app/jobs/halt_unhealthy_release_rollout_job.rb(1 hunks)app/jobs/increase_healthy_release_rollout_job.rb(1 hunks)app/jobs/verify_automatic_rollout_job.rb(1 hunks)app/libs/coordinators.rb(1 hunks)app/models/app_store_rollout.rb(2 hunks)app/models/config/submission.rb(3 hunks)app/models/play_store_rollout.rb(4 hunks)app/models/play_store_submission.rb(1 hunks)app/models/production_release.rb(1 hunks)app/models/release_platform_run.rb(1 hunks)app/models/store_rollout.rb(2 hunks)app/models/store_submission.rb(1 hunks)app/views/config/release_platforms/_form.html.erb(1 hunks)config/locales/passport/en.yml(1 hunks)config/schedule.yml(1 hunks)db/migrate/20250331140348_add_automatic_rollout_columns.rb(1 hunks)db/schema.rb(2 hunks)spec/factories/store_rollout.rb(1 hunks)spec/jobs/halt_unhealthy_release_rollout_job_spec.rb(1 hunks)spec/jobs/increase_healthy_release_rollout_job_spec.rb(1 hunks)spec/jobs/verify_automatic_rollout_job_spec.rb(1 hunks)spec/models/production_release_spec.rb(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (13)
app/models/production_release.rb (1)
app/models/google_firebase_integration.rb (1)
id(250-250)
app/jobs/verify_automatic_rollout_job.rb (1)
app/models/play_store_rollout.rb (1)
rollout(195-205)
app/models/app_store_rollout.rb (1)
app/models/play_store_rollout.rb (1)
disable_automatic_rollout!(144-146)
app/controllers/store_rollouts_controller.rb (1)
app/libs/coordinators.rb (1)
halt_the_store_rollout!(261-272)
spec/jobs/verify_automatic_rollout_job_spec.rb (1)
app/jobs/verify_automatic_rollout_job.rb (1)
perform(4-12)
app/models/play_store_submission.rb (2)
app/models/store_rollout.rb (1)
staged_rollout?(55-55)app/models/store_submission.rb (2)
staged_rollout?(68-70)auto_rollout?(72-80)
app/models/store_submission.rb (3)
app/models/production_release.rb (2)
production?(181-181)conf(179-179)app/models/store_rollout.rb (1)
staged_rollout?(55-55)app/models/app_store_rollout.rb (1)
automatic_rollout?(71-71)
app/jobs/increase_healthy_release_rollout_job.rb (4)
app/jobs/halt_unhealthy_release_rollout_job.rb (2)
queue_as(1-19)perform(4-18)app/models/play_store_rollout.rb (1)
rollout(195-205)app/models/app_store_rollout.rb (1)
automatic_rollout?(71-71)app/libs/coordinators.rb (1)
increase_the_store_rollout!(237-243)
app/models/release_platform_run.rb (5)
app/models/workflow_run.rb (1)
conf(350-350)app/models/pre_prod_release.rb (1)
conf(101-101)app/models/production_release.rb (1)
conf(179-179)app/models/store_submission.rb (1)
conf(175-175)app/components/all_builds_table_component.rb (1)
submissions(21-31)
app/libs/coordinators.rb (4)
app/models/pre_prod_release.rb (1)
actionable?(70-72)app/models/production_release.rb (1)
actionable?(102-105)app/models/app_store_rollout.rb (3)
halt_release!(113-128)automatic_rollout?(71-71)disable_automatic_rollout!(130-132)app/models/release_platform_run.rb (1)
automatic_rollout?(349-349)
spec/models/production_release_spec.rb (1)
app/models/production_release.rb (1)
fetch_health_data!(138-149)
spec/jobs/increase_healthy_release_rollout_job_spec.rb (2)
app/jobs/halt_unhealthy_release_rollout_job.rb (1)
perform(4-18)app/jobs/increase_healthy_release_rollout_job.rb (1)
perform(4-22)
app/models/play_store_rollout.rb (2)
app/models/concerns/lockable.rb (1)
with_lock(4-16)app/models/store_rollout.rb (3)
last_rollout_percentage(83-89)stamp_data(149-156)notification_params(101-103)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: lint
- GitHub Check: test
🔇 Additional comments (19)
app/models/play_store_submission.rb (1)
320-325: Persistingautomatic_rolloutlooks consistent, but double‑check how jobs filter on itWiring
automatic_rollout: auto_rollout?intocreate_play_store_rollout!keeps the DB flag aligned with the same predicate you already use to decide whether to callstart_release!, which is good.One nuance:
auto_rollout?returnstruefor any non‑production release, even whenstaged_rollout?isfalse, so non‑production rollouts will now be created withautomatic_rollout: trueandis_staged_rollout: false. That’s probably fine as long as the new background jobs that drive staged rollout progression (e.g., verify/increase/halt automatic rollouts) always scope tois_staged_rollout: trueand/or production runs when deciding which rollouts to advance.Please verify the job queries/Scopes on
PlayStoreRolloutso we don’t accidentally treat non‑production, non‑staged releases as candidates for staged progression just becauseautomatic_rolloutis true.config/schedule.yml (1)
9-12: LGTM! The 15-minute schedule addresses timezone concerns.The frequent polling (every 15 minutes) ensures that rollouts are verified regardless of when they were started across different timezones. The scheduler can check if 24 hours have passed since the last automatic rollout update and enqueue the appropriate job. Additionally, the halt functionality exists separately via
HaltUnhealthyReleaseRolloutJob(enqueued fromproduction_release.rb), addressing the concern about pausing unhealthy rollouts.spec/factories/store_rollout.rb (1)
6-6: LGTM!The factory correctly includes the new
automatic_rolloutfield with a sensible default value offalse.app/controllers/store_rollouts_controller.rb (1)
49-49: LGTM! Correctly signals manual halt to disable automatic rollout.Passing
manually: trueproperly indicates user-initiated halting, which triggers the coordinator to disable automatic rollout (as shown inapp/libs/coordinators.rblines 260-271). This prevents the automatic system from conflicting with manual intervention.app/views/config/release_platforms/_form.html.erb (1)
95-99: LGTM! Clear UI for automatic rollout configuration.The automatic rollout toggle is well-positioned within the Android rollout section with a descriptive info icon that clearly explains the feature's behavior. The UI properly communicates that Tramline will automatically increase rollouts daily based on release health rules.
app/libs/coordinators.rb (1)
261-272: LGTM! Clean separation of manual vs. automatic halt behavior.The addition of the
manuallyparameter effectively prevents automatic rollouts from resuming after a user manually halts them. The integration withdisable_automatic_rollout!is appropriate.app/models/config/submission.rb (1)
7-7: LGTM! Clean configuration field addition.The
automatic_rolloutfield is properly integrated into the JSON serialization/deserialization flow and aligns with the database schema.Also applies to: 53-54, 83-83
app/controllers/config/release_platforms_controller.rb (1)
85-87: LGTM! Proper parameter handling and cascading disable logic.The automatic_rollout parameter is correctly whitelisted and the cascading disable behavior (line 159) ensures consistency when staged rollouts are turned off.
Also applies to: 159-159
spec/models/production_release_spec.rb (1)
248-254: LGTM! Appropriate test update for new behavior.The test correctly verifies that
HaltUnhealthyReleaseRolloutJobis enqueued when health data is fetched, and the refactoring toreceive_messagesimproves readability.app/models/release_platform_run.rb (1)
349-349: LGTM! Clean predicate with safe navigation.The method safely traverses the configuration chain and provides a sensible default. The implementation addresses the past review comment by reading directly from the
confobject.app/jobs/increase_healthy_release_rollout_job.rb (1)
11-13: Verify: Result from increase action is not checked.The job calls
Action.increase_the_store_rollout!(rollout)on line 12 but does not check whether the action succeeded. If the increase fails (e.g., due to API errors), the job will still update timestamps and reschedule itself (lines 16-21), potentially leading to repeated failures without visibility.Consider checking the result and handling failures appropriately:
if release.healthy? && (rollout.started? || rollout.halted?) - Action.increase_the_store_rollout!(rollout) + result = Action.increase_the_store_rollout!(rollout) + return unless result.ok? endAlternatively, if the current behavior is intentional (to keep trying despite failures), please confirm this is the desired retry strategy.
spec/jobs/halt_unhealthy_release_rollout_job_spec.rb (1)
1-63: Good test coverage for the halt unhealthy release rollout job.The tests properly cover the key scenarios: automatic rollout enabled/disabled combined with healthy/unhealthy release states. The expectations correctly verify that
halt_releaseis only called when both conditions are met (automatic rollout enabled AND release unhealthy).app/models/store_rollout.rb (1)
53-53: Scope looks correct for identifying active automatic rollouts.The
automatic_rolloutsscope correctly filters for rollouts that are in progress (startedorhalted), are staged rollouts, and have automatic rollout enabled. This aligns with the job logic that processes these rollouts.spec/jobs/verify_automatic_rollout_job_spec.rb (1)
32-39: Test correctly verifies the eligibility criteria for automatic rollout processing.The test sets up a rollout where:
automatic_rollout_next_update_atis in the past (10.minutes.ago)- The interval between updated_at and next_update_at exceeds 300 seconds (24 hours vs 10 minutes)
This correctly triggers the
IncreaseHealthyReleaseRolloutJobenqueue as expected.spec/jobs/increase_healthy_release_rollout_job_spec.rb (1)
1-122: Comprehensive test coverage for the automatic rollout progression job.The tests properly cover the key scenarios including rollout state transitions, health checks, scheduling behavior, and guards for non-applicable rollouts. The use of
freeze_timefor timestamp assertions is appropriate.app/models/play_store_rollout.rb (3)
166-175: Thepause_release!method implementation looks correct.The method properly:
- Guards with
may_pause?- Disables automatic rollout
- Transitions state to paused
- Creates an event stamp
- Sends notification
One minor observation: the notification message "Automatic rollout has been paused" assumes the pause was due to automatic rollout, but this method could potentially be called for manual pauses too. Consider if the message should be more generic.
37-37: The 24-hour interval constant is well-defined.
AUTO_ROLLOUT_RUN_INTERVAL = 24.hoursclearly expresses the automatic rollout cadence and is used consistently across the related jobs.
43-55: State machine transitions for pause/resume are correctly defined.The additions properly model the pause/resume flow:
startcan now transition frompausedstate (in addition tocreatedandhalted)- New
pauseevent transitions fromstartedtopaused- New
resumeevent transitions frompausedtostarteddb/schema.rb (1)
1-13: Schema changes do not contain sensitive data; no anonymize.rake updates needed.The new columns added to
store_rollouts(automatic_rollout,automatic_rollout_updated_at,automatic_rollout_next_update_at) andsubmission_configs(automatic_rollout) are operational configuration flags and scheduling timestamps used by automated rollout jobs. These columns contain no sensitive user, personal, or business data—they track system-level rollout automation state only. The anonymize.rake file already correctly handles these tables with a whitelist strategy that excludes non-sensitive operational fields, which is the appropriate behavior.Likely an incorrect or invalid review comment.
| def perform(production_release_id) | ||
| production_release = ProductionRelease.find(production_release_id) | ||
|
|
||
| if production_release.unhealthy? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find ProductionRelease model
fd -t f "production_release\.rb|production_releases\.rb" -type fRepository: tramlinehq/tramline
Length of output: 233
🏁 Script executed:
#!/bin/bash
# Search for healthy? and unhealthy? method definitions in ProductionRelease
ast-grep --pattern 'def healthy?
$$$
'
ast-grep --pattern 'def unhealthy?
$$$
'Repository: tramlinehq/tramline
Length of output: 1701
🏁 Script executed:
#!/bin/bash
# Also check for IncreaseHealthyReleaseRolloutJob
fd -t f "increase_healthy.*job" -type fRepository: tramlinehq/tramline
Length of output: 233
🏁 Script executed:
#!/bin/bash
# Find all jobs to understand the context better
fd -t f ".*_job\.rb" app/jobs/ | head -20Repository: tramlinehq/tramline
Length of output: 861
🏁 Script executed:
#!/bin/bash
# Read the halt_unhealthy_release_rollout_job.rb file
cat -n app/jobs/halt_unhealthy_release_rollout_job.rbRepository: tramlinehq/tramline
Length of output: 832
🏁 Script executed:
#!/bin/bash
# Read the increase_healthy_release_rollout_job.rb file
cat -n app/jobs/increase_healthy_release_rollout_job.rbRepository: tramlinehq/tramline
Length of output: 1054
🏁 Script executed:
#!/bin/bash
# Check Release model for healthy? method
rg -n "def healthy\?" app/models/release.rb -A 10Repository: tramlinehq/tramline
Length of output: 45
🏁 Script executed:
#!/bin/bash
# Check the relationship between Release and ProductionRelease
rg -n "production_release" app/models/release.rb -B 2 -A 2 | head -40Repository: tramlinehq/tramline
Length of output: 1057
🏁 Script executed:
#!/bin/bash
# Broader search for healthy? in the Release model file
cat -n app/models/release.rb | grep -C 5 "healthy"Repository: tramlinehq/tramline
Length of output: 1102
🏁 Script executed:
#!/bin/bash
# Search for any healthy? method definitions in the entire codebase
rg -n "^\s*def healthy\?" app/models/ -B 2Repository: tramlinehq/tramline
Length of output: 339
Define Release#healthy? method or use inverse predicate in IncreaseHealthyReleaseRolloutJob.
The Release model lacks a healthy? method, but IncreaseHealthyReleaseRolloutJob calls release.healthy? on line 11. The Release model only defines unhealthy? (which checks if any platform run is unhealthy). Add Release#healthy? as the inverse of unhealthy?, or refactor the job to use !release.unhealthy? for consistency with the predicate pattern used elsewhere.
🤖 Prompt for AI Agents
In app/jobs/halt_unhealthy_release_rollout_job.rb around line 7, the job and
IncreaseHealthyReleaseRolloutJob expect Release#healthy? to exist but the model
only implements unhealthy?; add a Release#healthy? instance method that returns
the inverse of unhealthy? (e.g., !unhealthy?) so both predicates are available
and consistent, or alternatively update the jobs to use !release.unhealthy?
everywhere; implement the method in the Release model to return !unhealthy? and
update tests if any rely on healthy?.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
spec/jobs/increase_healthy_release_rollout_job_spec.rb (1)
28-28: Verify thatProductionRelease#healthy?is implemented.The tests mock
ProductionRelease#healthy?(lines 28, 51, 73), and the job implementation callsrelease.healthy?(per relevant code snippet at app/jobs/increase_healthy_release_rollout_job.rb:11). However, based on past review comments and the provided code snippets showing onlyunhealthy?defined as!healthy?, there's a risk that thehealthy?method may not be implemented, which would cause aNoMethodErroror infinite recursion.Run the following to confirm
healthy?is defined:#!/bin/bash # Search for healthy? method definition in ProductionRelease ast-grep --pattern 'class ProductionRelease $$$ def healthy? $$$ end $$$ ' # Also check if it might be defined in a concern or parent class rg -n "def healthy\?" app/models/ -B 2 -A 5Also applies to: 51-51, 73-73
🧹 Nitpick comments (1)
app/jobs/halt_unhealthy_release_rollout_job.rb (1)
4-18: Consider adding error handling for missing records.The job will raise
ActiveRecord::RecordNotFoundif either the ProductionRelease or ReleaseHealthEvent is not found. While this may be acceptable for a high-priority job (failures will be visible), consider adding error handling if you want graceful degradation when records are deleted between enqueueing and execution.Optional defensive approach:
def perform(production_release_id, event_id) - production_release = ProductionRelease.find(production_release_id) - event = ReleaseHealthEvent.find(event_id) + production_release = ProductionRelease.find_by(id: production_release_id) + event = ReleaseHealthEvent.find_by(id: event_id) + return unless production_release && event if production_release.unhealthy?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
app/jobs/halt_unhealthy_release_rollout_job.rb(1 hunks)app/jobs/increase_healthy_release_rollout_job.rb(1 hunks)app/models/release_health_event.rb(2 hunks)app/models/store_submission.rb(1 hunks)config/locales/passport/en.yml(1 hunks)spec/factories/release_health_events.rb(1 hunks)spec/jobs/halt_unhealthy_release_rollout_job_spec.rb(1 hunks)spec/jobs/increase_healthy_release_rollout_job_spec.rb(1 hunks)spec/models/play_store_submission_spec.rb(1 hunks)spec/models/release_health_event_spec.rb(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- spec/jobs/halt_unhealthy_release_rollout_job_spec.rb
- app/jobs/increase_healthy_release_rollout_job.rb
🧰 Additional context used
🧬 Code graph analysis (3)
spec/models/play_store_submission_spec.rb (1)
app/models/store_submission.rb (2)
auto_rollout?(72-78)conf(173-173)
app/models/release_health_event.rb (3)
app/models/production_release.rb (1)
unhealthy?(154-156)app/models/release.rb (1)
unhealthy?(257-259)app/models/release_platform_run.rb (1)
unhealthy?(203-205)
spec/jobs/increase_healthy_release_rollout_job_spec.rb (1)
app/jobs/increase_healthy_release_rollout_job.rb (1)
perform(4-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: test
- GitHub Check: lint
🔇 Additional comments (8)
config/locales/passport/en.yml (1)
79-82: LGTM! Clear messaging for automatic rollout states.The addition of "automatic" to the resumed and paused messages improves clarity by distinguishing automatic rollout actions from manual ones. The terminology is consistent and correctly references Play Store.
app/jobs/halt_unhealthy_release_rollout_job.rb (1)
8-17: LGTM! Clean halt logic with appropriate checks.The conditional checks ensure the halt action is only triggered for Play Store rollouts with automatic rollout enabled, and the event is properly updated on success. The use of
result.ok?provides safe error handling for the halt action.spec/factories/release_health_events.rb (1)
3-3: LGTM! Factory correctly updated for new association.The change from
deployment_runtoproduction_releasealigns with the model's association changes.spec/models/release_health_event_spec.rb (1)
1-33: LGTM! Comprehensive test coverage for the callback.The tests properly verify that:
- Unhealthy events trigger
HaltUnhealthyReleaseRolloutJobwith correct arguments- Healthy events do not trigger the job
The test structure is clear and uses appropriate RSpec patterns.
spec/models/play_store_submission_spec.rb (1)
209-260: LGTM! Thorough test coverage for auto_rollout? logic.The tests comprehensively cover all branches of the
auto_rollout?method:
- Production releases with both flags enabled (true case)
- Production releases with either flag disabled (false cases)
- Non-production releases (always false)
The use of mocking to control configuration values is appropriate and keeps tests focused.
app/models/release_health_event.rb (1)
33-33: LGTM! Appropriate callback for job enqueueing.The use of
after_create_commitis correct for enqueueing background jobs, ensuring the job is only enqueued after the transaction commits. The guard clause properly restricts the trigger to unhealthy events only.Also applies to: 64-67
app/models/store_submission.rb (1)
72-78: LGTM! Correctly implements automatic rollout conditions.The refactored logic properly restricts automatic rollouts to production releases that have both staged rollout and automatic rollout enabled. The two-branch structure (if/else) correctly addresses the past review feedback.
spec/jobs/increase_healthy_release_rollout_job_spec.rb (1)
1-135: LGTM! Comprehensive test coverage for automatic rollout scenarios.The test file thoroughly covers:
- Different rollout states (created, started, halted, completed)
- Health status variations (healthy/unhealthy)
- Configuration flags (automatic_rollout enabled/disabled, staged/non-staged)
- Time-based scheduling verification (24-hour intervals)
- State transitions and API call expectations
The test structure is clear, well-organized, and uses appropriate mocking strategies.
Smart Staged Rollouts
Automatically rollout the release to play store when the release is healthy, as determined by release health rules.
Automatic rollout happens once release is made and will proceed in increments as defined in staged rollout once every day.
Config UI Screenshot
Closes: #425
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.