Skip to content

Conversation

@samrat
Copy link
Contributor

@samrat samrat commented Feb 26, 2025

Smart Staged Rollouts

Automatically rollout the release to play store when the release is healthy, as determined by release health rules.

Automatic rollout happens once release is made and will proceed in increments as defined in staged rollout once every day.

Config UI Screenshot

image

Closes: #425

Summary by CodeRabbit

Release Notes

  • New Features
    • Added automatic daily rollout capability for Play Store releases, enabling staged rollouts based on release health metrics and configured intervals.
    • Introduced pause and resume functionality for active rollouts.
    • Added configuration option to enable automatic rollout in release platform settings.
    • Automatic rollout halts when release health deteriorates, protecting users from unstable releases.

✏️ Tip: You can customize this high-level summary in your review settings.

@nileshgr nileshgr force-pushed the feat/smart-rollouts branch from 35437c6 to f753a9b Compare March 31, 2025 16:33
@nileshgr nileshgr self-assigned this Mar 31, 2025
Copy link
Member

@kitallis kitallis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one thing to make clear in the notice is when one doesn't have any rules configured, unhealthy returns false, which is the correct behavior, but we should let users know that the rollout has no way of halting automatically in that case and it's a bit of a runaway train.

I'll check back later once you're fully done with this.

@kitallis kitallis requested a deployment to feat/smart-rollouts - site-db PR #763 April 4, 2025 11:31 — with Render Abandoned
@nileshgr nileshgr marked this pull request as ready for review April 11, 2025 15:54
@nileshgr nileshgr requested a review from kitallis April 11, 2025 15:55
Copy link
Member

@kitallis kitallis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need to do another pass, to ensure the behavior is correct.

@nileshgr nileshgr force-pushed the feat/smart-rollouts branch from a04e5b7 to 87fe9d9 Compare April 20, 2025 09:36
Copy link
Member

@kitallis kitallis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should pair on this once you're done with #794 because it requires some careful redoing.

def auto_rollout?
if parent_release.production? && staged_rollout? && conf.automatic_rollout?
true
elsif !parent_release.production?
Copy link
Member

@kitallis kitallis Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this elsif is incorrect, when it's non-production, it should always be false since we don't support automatic rollouts anywhere other than prod. There's probably only two-cases here, the truthy one otherwise the falsey one.

Action.halt_the_store_rollout!(store_rollout)
store_rollout.update!(
automatic_rollout_updated_at: Time.current,
automatic_rollout_next_update_at: Time.current + PlayStoreRollout::AUTO_ROLLOUT_RUN_INTERVAL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

problem: I think this is incorrect, because it will reschedule the next rollout at the point of the halt. Within a 24 hour window, a halt could happen very late, which means the next rollout can potentially get delayed if the release becomes healthy.

for example:

day1 / 11pm: rollout stage 1
day1 / 6pm: release unhealthy: halted

with the above logic the next rollout will happen on day 2 / 6pm. instead this should happen:

day1 / 11pm: rollout stage 1
day1 / 6pm: release unhealthy: halted
day1 / 7pm: release healthy
day2 / 11pm: rollout stage 2

I think when hallting updating automatic_rollout_next_update_at is not needed

Copy link
Member

@kitallis kitallis Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

problem This job can cause double/multi increments to the rollout. Scenario:

  1. cascading job from IncreaseHealthyReleaseRolloutJob scheduled for day2/11am
  2. the job gets delayed by 1 minute (because of queue buildup)
  3. Verify kicks in because automatic_rollout_updated_at was at day1/11:00am and automatic_rollout_next_update_at is at day2/11:00am (so diff is > 300 seconds)
  4. it triggers the increase rollout job
  5. the delayed job from point 2 also kicks in
  6. increments ends up happening twice in 2 minutes

there are multiple problems here:

  1. IncreaseHealthyReleaseRolloutJob is not fully idempotent because it only looks at time, I think it also needs to look at the "stage" of the rollout
  2. the verification check of 300 seconds should not look at the diff between next and last, it should probably just do "delta" verification which is: was I supposed to run at t0, but I haven't run for t0 + <large jitter like 1 hour> because its purpose should be integrity check rather than another way of updating rollout

def self.halt_the_store_rollout!(rollout)
def self.halt_the_store_rollout!(rollout, manually: false)
return Res.new { raise "release is not actionable" } unless rollout.actionable?
return Res.new { raise "rollout is not started" } unless rollout.started?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

problem: our state machine allows moving from halted back to started (so we can increase rollout) but this guard clause will prevent it from actually happening and blow up

we need to remove this I think

resumed_html: 'Resumed the rollout of <span class="emphasize">%{version} (%{build_number})</span> to Play Store <span class="emphasize">%{track}</span> track at <span class="emphasize">%{rollout_percentage}%</span>'
completed_html: 'Completed the rollout of <span class="emphasize">%{version} (%{build_number})</span> to Play Store <span class="emphasize">%{track}</span> track at <span class="emphasize">%{rollout_percentage}%</span>'
fully_released_html: 'Fully released <span class="emphasize">%{version} (%{build_number})</span> to Play Store <span class="emphasize">%{track}</span> track from <span class="emphasize">%{rollout_percentage}%</span>'
paused_html: 'Paused the rollout of <span class="emphasize">%{version} (%{build_number})</span> to App Store at phase <span class="emphasize">%{current_stage} (%{rollout_percentage}%)</span>'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

problem There's a copy-paste error here it says "App Store" instead of "Play Store"

result = rollout(last_rollout_percentage, retry_on_review_fail: true)
if result.ok?
start!
if release_platform_run.automatic_rollout?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

problem: I think this why you probably added the long dig to find the config in release_platform_run but it's not necessary.

we can just do

play_store_submission.conf.automatic_rollout?

and then delegate can be chucked

delegate :automatic_rollout?, to: :release_platform_run

release_data = monitoring_provider.find_release(platform, version_name, build_number, store_rollout.created_at)
return if release_data.blank?
release_health_metrics.create!(fetched_at: Time.current, **release_data)
HaltUnhealthyReleaseRolloutJob.perform_async(id)
Copy link
Member

@kitallis kitallis Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: ReleaseHealthEvent has this thing

after_create_commit :notify_health_rule_triggered

which notifies, I think similarly we can inline halting in ReleaseHealthEvent or move it to a Signal instead.

but more importantly, we should also mark this debug-flag we'd added called ReleaseHealthEvent#action_triggered which we should mark as true once we kickoff the job, so that there's a log of the fact that a particular health event caused an action to happen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

additionally, we need to do the reverse as well. when a release health event makes the release healthy, we should "resume" the release from halted

currently we just wait for the next rollout to kick in (which is good) but halting "stops" the adoption of the app, so when it's healthy it should restart the adoption of the app as well (before it increments) if possible.

@kitallis kitallis changed the title Add job to increase rollout if release is healthy feat: smart staged rollouts for Android Jul 22, 2025
@coderabbitai
Copy link

coderabbitai bot commented Nov 28, 2025

Walkthrough

The pull request implements an automatic rollout feature for app releases. It adds database columns to track automatic rollout state and timing on store_rollouts and submission_configs tables. Three new background jobs are introduced: VerifyAutomaticRolloutJob runs every 15 minutes to identify rollouts eligible for updates, IncreaseHealthyReleaseRolloutJob advances rollout stages when releases are healthy, and HaltUnhealthyReleaseRolloutJob halts rollouts when releases are unhealthy. PlayStoreRollout gains pause/resume states, a pause_release! method, and automatic rollout control methods. Controllers are updated to permit automatic rollout parameters, UI forms add configuration options, and models are enhanced with automatic rollout logic. ReleaseHealthEvent now triggers the halt job on unhealthy event creation.

Sequence Diagram(s)

sequenceDiagram
    participant Scheduler as Scheduler<br/>(every 15min)
    participant VAJob as VerifyAutomatic<br/>RolloutJob
    participant IHRJob as IncreaseHealthy<br/>ReleaseRolloutJob
    participant PSR as PlayStoreRollout<br/>(Automatic)
    participant PR as ProductionRelease
    participant RHE as ReleaseHealthEvent
    participant HUJob as HaltUnhealthy<br/>ReleaseRolloutJob
    
    rect rgb(245, 245, 250)
        note over Scheduler: Scheduled every 15 minutes
        Scheduler->>VAJob: perform()
    end
    
    rect rgb(240, 250, 240)
        note over VAJob: Find rollouts: automatic_rollout=true,<br/>status in [started, halted],<br/>ready for update (timing check)
        VAJob->>PSR: Query automatic_rollouts<br/>with timing conditions
        VAJob->>IHRJob: enqueue(play_store_rollout_id)
    end
    
    rect rgb(250, 245, 240)
        note over IHRJob: Process enrolled rollout<br/>and health status
        IHRJob->>PSR: Load rollout
        IHRJob->>PR: Check if healthy?
        
        alt Healthy Release
            IHRJob->>PSR: Resume if halted
            IHRJob->>PSR: Increase stage
            IHRJob->>PSR: Update automatic_rollout<br/>timing fields
            IHRJob->>IHRJob: Schedule next update<br/>(24 hours)
        else Unhealthy Release
            note over IHRJob: Unhealthy: skip increase,<br/>still reschedule
            IHRJob->>IHRJob: Schedule next update<br/>(24 hours)
        end
    end
    
    rect rgb(250, 235, 235)
        note over RHE: Unhealthy event created
        RHE->>HUJob: trigger_halt_on_unhealthy<br/>(after_create_commit)
        HUJob->>PSR: Check if automatic?
        alt Automatic Rollout Enabled
            HUJob->>PSR: halt_the_store_rollout!<br/>(manually: true)
            HUJob->>PSR: disable_automatic_rollout!
            HUJob->>RHE: Update action_triggered=true
        end
    end
Loading

Poem

🐰 In the warren of releases, we hop with care,
Automatic rollouts through the digital air,
When health blooms bright, we advance the stage,
When trouble appears, we pause the page,
Twenty-four hours, then we check once more,
A rhythm of safety from shore to shore!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description check ❓ Inconclusive The PR description is incomplete. It is missing the 'Scenarios tested' section from the required template, and while it includes a closing issue reference, the 'This addresses' section is minimal. Add the 'Scenarios tested' checklist section and expand 'This addresses' to detail all covered functionality, including health-based automation, daily increments, and configuration UI.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'feat: smart staged rollouts for Android' clearly summarizes the main feature being added—automatic staged rollout management for Android releases.
Linked Issues check ✅ Passed The PR implementation meets the core coding requirements from issue #425: automatic staged rollout management triggered by release health signals, with daily increment scheduling and integration with existing features.
Out of Scope Changes check ✅ Passed All changes are focused on implementing automatic staged rollouts for Android. No unrelated modifications or out-of-scope changes were introduced.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/smart-rollouts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
app/components/live_release/prod_release/rollout_component.rb (1)

33-37: Duplicate delegation: automatic_rollout? is delegated to both store_rollout and release_platform_run.

Line 33 delegates automatic_rollout? to store_rollout, and line 37 delegates the same method to release_platform_run. The second delegation will override the first, which may not be the intended behavior.

Please clarify which delegation is correct. If both sources need to be consulted, consider:

  • Removing the duplicate and keeping only one delegation
  • Or renaming one of the methods to avoid the conflict
  • Or implementing a custom method that checks both sources if needed
# If only store_rollout delegation is needed:
delegate :automatic_rollout?, to: :store_rollout

# OR if only release_platform_run delegation is needed:
delegate :automatic_rollout?, to: :release_platform_run

# OR if both need to be checked, implement a custom method:
def automatic_rollout?
  store_rollout.automatic_rollout? && release_platform_run.automatic_rollout?
end
app/models/app_store_rollout.rb (1)

71-71: Inconsistent automatic rollout behavior: hardcoded true conflicts with no-op disable_automatic_rollout!

Line 71 hardcodes automatic_rollout? to always return true, but the disable_automatic_rollout! method (lines 130-132) is a no-op. This creates an inconsistency where automatic rollout can never be disabled for App Store rollouts, even though the method exists.

Additionally, the schema shows an automatic_rollout column (line 6) that should likely be used instead of hardcoding the value.

Consider one of these approaches:

Option 1: Use the database column (recommended if App Store supports automatic rollout):

-  def automatic_rollout? = true
+  def automatic_rollout? = automatic_rollout

And implement disable_automatic_rollout!:

  def disable_automatic_rollout!
-    # noop for app store
+    update!(automatic_rollout: false)
  end

Option 2: Remove the feature for App Store (if App Store doesn't support it):

-  def automatic_rollout? = true
+  def automatic_rollout? = false

This makes it clear that App Store rollouts don't support automatic control.

♻️ Duplicate comments (7)
config/locales/passport/en.yml (1)

82-82: Fix the copy-paste error: should say "Play Store" not "App Store".

This translation entry is under the play_store_rollout: section but incorrectly references "App Store". This will confuse users when they see pause notifications for Play Store rollouts.

Apply this diff:

-      paused_html: 'Paused the rollout of <span class="emphasize">%{version} (%{build_number})</span> to App Store at phase <span class="emphasize">%{current_stage} (%{rollout_percentage}%)</span>'
+      paused_html: 'Paused the rollout of <span class="emphasize">%{version} (%{build_number})</span> to Play Store at phase <span class="emphasize">%{current_stage} (%{rollout_percentage}%)</span>'
app/models/production_release.rb (2)

148-148: Missing: Resume rollout when release becomes healthy.

The current implementation only enqueues HaltUnhealthyReleaseRolloutJob to halt unhealthy releases, but there's no corresponding logic to resume the rollout when a release becomes healthy again. When a rollout is halted, it stops user adoption. If the release becomes healthy, the rollout should be automatically resumed to restart adoption before incrementing.

Consider enqueuing a resume job when the release is healthy:

if unhealthy?
  HaltUnhealthyReleaseRolloutJob.perform_async(id)
else
  ResumeHealthyReleaseRolloutJob.perform_async(id)
end

148-148: Missing: Mark action_triggered flag when automatic action is taken.

When HaltUnhealthyReleaseRolloutJob is enqueued, the system should mark ReleaseHealthEvent#action_triggered as true to maintain an audit trail showing that a particular health event triggered an automatic action. This provides visibility into which health events caused automatic rollout controls.

app/jobs/verify_automatic_rollout_job.rb (1)

4-11: Known issue: Race condition can cause double-increments.

The 300-second verification window (line 8) can lead to double rollout increments when scheduled jobs are delayed, as previously identified. The verification should use a larger jitter window (e.g., 1 hour) for integrity checks rather than serving as another update trigger.

Based on past review feedback, consider:

  1. Making IncreaseHealthyReleaseRolloutJob fully idempotent by checking the current rollout stage in addition to time
  2. Increasing the verification window to ~1 hour to serve as an integrity check rather than a primary update mechanism
app/jobs/halt_unhealthy_release_rollout_job.rb (1)

12-15: The timing logic for automatic_rollout_next_update_at on halt was previously flagged.

A past review comment noted that updating automatic_rollout_next_update_at during a halt could delay the next rollout unnecessarily. If a release becomes healthy again soon after being halted, the next rollout would still be delayed by the full interval from the halt time rather than from the original schedule.

app/models/store_submission.rb (1)

72-80: The logic for non-production releases was previously flagged as potentially incorrect.

A past review comment indicated that automatic rollouts are only supported for production releases, so auto_rollout? should return false for non-production releases. However, the current implementation returns true when !parent_release.production? (lines 75-76).

If automatic rollouts are indeed only for production, the method could be simplified:

  def auto_rollout?
-   if parent_release.production? && staged_rollout? && conf.automatic_rollout?
-     true
-   elsif !parent_release.production?
-     true
-   else
-     false
-   end
+   return false unless parent_release.production?
+   staged_rollout? && conf.automatic_rollout?
  end
app/models/play_store_rollout.rb (1)

155-157: A past review comment suggested using play_store_submission.conf.automatic_rollout? instead of delegation.

The current code uses release_platform_run.automatic_rollout? to determine whether to re-enable automatic rollout on resume. A previous review suggested using play_store_submission.conf.automatic_rollout? directly instead of delegating through release_platform_run.

🧹 Nitpick comments (7)
app/models/app_store_rollout.rb (1)

130-132: Document why disable_automatic_rollout! is a no-op for App Store.

If automatic rollout is not applicable to App Store (due to Apple's phased release being fully automated), please add a comment explaining this design decision.

  def disable_automatic_rollout!
-    # noop for app store
+    # No-op: App Store phased releases are automatically managed by Apple
+    # and cannot be manually controlled via the automatic rollout system
  end
db/migrate/20250331140348_add_automatic_rollout_columns.rb (1)

6-6: Consider adding null constraint for consistency.

Line 6 adds automatic_rollout to submission_configs with default: false but without null: false, unlike the store_rollouts.automatic_rollout column on line 3. This creates a three-state boolean (true/false/null) which can lead to ambiguous logic.

Apply this diff to add the null constraint:

-    add_column :submission_configs, :automatic_rollout, :boolean, default: false
+    add_column :submission_configs, :automatic_rollout, :boolean, null: false, default: false
app/jobs/increase_healthy_release_rollout_job.rb (1)

16-21: Consider: Timestamps updated regardless of increase success.

Lines 16-19 update automatic_rollout_updated_at and automatic_rollout_next_update_at even if the rollout increase on line 12 failed or was skipped (when the release is not healthy). This means the job reschedules itself (line 21) without confirming progress was made.

If the intent is to keep retrying until successful, consider only updating timestamps and rescheduling when the increase action succeeds:

     if release.healthy? && (rollout.started? || rollout.halted?)
-      Action.increase_the_store_rollout!(rollout)
+      result = Action.increase_the_store_rollout!(rollout)
+      return unless result.ok?
     end

     # This update is necessary so that our verify rollout job does not pick this up again
     rollout.update!(
       automatic_rollout_updated_at: Time.current,
       automatic_rollout_next_update_at: Time.current + PlayStoreRollout::AUTO_ROLLOUT_RUN_INTERVAL
     )

     IncreaseHealthyReleaseRolloutJob.perform_in(PlayStoreRollout::AUTO_ROLLOUT_RUN_INTERVAL, play_store_rollout_id)

This ensures the job only advances the schedule when actual progress is made.

spec/jobs/halt_unhealthy_release_rollout_job_spec.rb (1)

26-35: Consider adding assertions for timestamp updates after halting.

The job updates automatic_rollout_updated_at and automatic_rollout_next_update_at after halting. Consider adding assertions to verify these values are set correctly, especially given the past review discussion about whether updating automatic_rollout_next_update_at on halt is the correct behavior.

spec/jobs/verify_automatic_rollout_job_spec.rb (1)

41-50: Consider adding a test case for the 300-second minimum interval guard.

The job implementation has a check for automatic_rollout_next_update_at - automatic_rollout_updated_at >= interval '300 second'. Consider adding a specific test case where the interval is less than 300 seconds to verify this guard prevents enqueuing.

db/schema.rb (1)

920-923: Consider adding an index on automatic_rollout_next_update_at for query performance.

The VerifyAutomaticRolloutJob queries store_rollouts filtering by automatic_rollout_next_update_at. As the table grows, this query could benefit from an index:

WHERE automatic_rollout_next_update_at < NOW()
  AND automatic_rollout_next_update_at - automatic_rollout_updated_at >= interval '300 second'
spec/jobs/increase_healthy_release_rollout_job_spec.rb (1)

69-76: Missing test for halted + healthy scenario.

Based on the job implementation, when a rollout is halted but the release is healthy?, it should trigger Action.increase_the_store_rollout!. This test only covers the halted state without mocking healthy?, which means healthy? likely returns false by default.

Consider adding a test case:

context "when a staged play store rollout is halted and release is healthy" do
  let(:store_rollout) { create(:store_rollout, :play_store, :halted, is_staged_rollout: true, automatic_rollout: true, store_submission: play_store_submission) }

  before do
    allow_any_instance_of(ProductionRelease).to receive(:healthy?).and_return(true)
  end

  it "rolls out the release to next stage" do
    described_class.new.perform(store_rollout.id)
    expect(play_store_integration).to have_received(:rollout_release)
  end
end
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between daedb84 and 6b62771.

📒 Files selected for processing (25)
  • app/components/live_release/prod_release/rollout_component.rb (2 hunks)
  • app/controllers/config/release_platforms_controller.rb (2 hunks)
  • app/controllers/store_rollouts_controller.rb (1 hunks)
  • app/jobs/halt_unhealthy_release_rollout_job.rb (1 hunks)
  • app/jobs/increase_healthy_release_rollout_job.rb (1 hunks)
  • app/jobs/verify_automatic_rollout_job.rb (1 hunks)
  • app/libs/coordinators.rb (1 hunks)
  • app/models/app_store_rollout.rb (2 hunks)
  • app/models/config/submission.rb (3 hunks)
  • app/models/play_store_rollout.rb (4 hunks)
  • app/models/play_store_submission.rb (1 hunks)
  • app/models/production_release.rb (1 hunks)
  • app/models/release_platform_run.rb (1 hunks)
  • app/models/store_rollout.rb (2 hunks)
  • app/models/store_submission.rb (1 hunks)
  • app/views/config/release_platforms/_form.html.erb (1 hunks)
  • config/locales/passport/en.yml (1 hunks)
  • config/schedule.yml (1 hunks)
  • db/migrate/20250331140348_add_automatic_rollout_columns.rb (1 hunks)
  • db/schema.rb (2 hunks)
  • spec/factories/store_rollout.rb (1 hunks)
  • spec/jobs/halt_unhealthy_release_rollout_job_spec.rb (1 hunks)
  • spec/jobs/increase_healthy_release_rollout_job_spec.rb (1 hunks)
  • spec/jobs/verify_automatic_rollout_job_spec.rb (1 hunks)
  • spec/models/production_release_spec.rb (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (13)
app/models/production_release.rb (1)
app/models/google_firebase_integration.rb (1)
  • id (250-250)
app/jobs/verify_automatic_rollout_job.rb (1)
app/models/play_store_rollout.rb (1)
  • rollout (195-205)
app/models/app_store_rollout.rb (1)
app/models/play_store_rollout.rb (1)
  • disable_automatic_rollout! (144-146)
app/controllers/store_rollouts_controller.rb (1)
app/libs/coordinators.rb (1)
  • halt_the_store_rollout! (261-272)
spec/jobs/verify_automatic_rollout_job_spec.rb (1)
app/jobs/verify_automatic_rollout_job.rb (1)
  • perform (4-12)
app/models/play_store_submission.rb (2)
app/models/store_rollout.rb (1)
  • staged_rollout? (55-55)
app/models/store_submission.rb (2)
  • staged_rollout? (68-70)
  • auto_rollout? (72-80)
app/models/store_submission.rb (3)
app/models/production_release.rb (2)
  • production? (181-181)
  • conf (179-179)
app/models/store_rollout.rb (1)
  • staged_rollout? (55-55)
app/models/app_store_rollout.rb (1)
  • automatic_rollout? (71-71)
app/jobs/increase_healthy_release_rollout_job.rb (4)
app/jobs/halt_unhealthy_release_rollout_job.rb (2)
  • queue_as (1-19)
  • perform (4-18)
app/models/play_store_rollout.rb (1)
  • rollout (195-205)
app/models/app_store_rollout.rb (1)
  • automatic_rollout? (71-71)
app/libs/coordinators.rb (1)
  • increase_the_store_rollout! (237-243)
app/models/release_platform_run.rb (5)
app/models/workflow_run.rb (1)
  • conf (350-350)
app/models/pre_prod_release.rb (1)
  • conf (101-101)
app/models/production_release.rb (1)
  • conf (179-179)
app/models/store_submission.rb (1)
  • conf (175-175)
app/components/all_builds_table_component.rb (1)
  • submissions (21-31)
app/libs/coordinators.rb (4)
app/models/pre_prod_release.rb (1)
  • actionable? (70-72)
app/models/production_release.rb (1)
  • actionable? (102-105)
app/models/app_store_rollout.rb (3)
  • halt_release! (113-128)
  • automatic_rollout? (71-71)
  • disable_automatic_rollout! (130-132)
app/models/release_platform_run.rb (1)
  • automatic_rollout? (349-349)
spec/models/production_release_spec.rb (1)
app/models/production_release.rb (1)
  • fetch_health_data! (138-149)
spec/jobs/increase_healthy_release_rollout_job_spec.rb (2)
app/jobs/halt_unhealthy_release_rollout_job.rb (1)
  • perform (4-18)
app/jobs/increase_healthy_release_rollout_job.rb (1)
  • perform (4-22)
app/models/play_store_rollout.rb (2)
app/models/concerns/lockable.rb (1)
  • with_lock (4-16)
app/models/store_rollout.rb (3)
  • last_rollout_percentage (83-89)
  • stamp_data (149-156)
  • notification_params (101-103)
🪛 GitHub Actions: Lint
db/schema.rb

[warning] 1-1: Schema linting warnings: db/schema.rb changed in multiple commits without corresponding updates in lib/tasks/anonymize.rake afterward (e.g., commits 277b0f8, 4051a5c, ffb12bf, cb413fe, 7ff678c).

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: lint
  • GitHub Check: test
🔇 Additional comments (19)
app/models/play_store_submission.rb (1)

320-325: Persisting automatic_rollout looks consistent, but double‑check how jobs filter on it

Wiring automatic_rollout: auto_rollout? into create_play_store_rollout! keeps the DB flag aligned with the same predicate you already use to decide whether to call start_release!, which is good.

One nuance: auto_rollout? returns true for any non‑production release, even when staged_rollout? is false, so non‑production rollouts will now be created with automatic_rollout: true and is_staged_rollout: false. That’s probably fine as long as the new background jobs that drive staged rollout progression (e.g., verify/increase/halt automatic rollouts) always scope to is_staged_rollout: true and/or production runs when deciding which rollouts to advance.

Please verify the job queries/Scopes on PlayStoreRollout so we don’t accidentally treat non‑production, non‑staged releases as candidates for staged progression just because automatic_rollout is true.

config/schedule.yml (1)

9-12: LGTM! The 15-minute schedule addresses timezone concerns.

The frequent polling (every 15 minutes) ensures that rollouts are verified regardless of when they were started across different timezones. The scheduler can check if 24 hours have passed since the last automatic rollout update and enqueue the appropriate job. Additionally, the halt functionality exists separately via HaltUnhealthyReleaseRolloutJob (enqueued from production_release.rb), addressing the concern about pausing unhealthy rollouts.

spec/factories/store_rollout.rb (1)

6-6: LGTM!

The factory correctly includes the new automatic_rollout field with a sensible default value of false.

app/controllers/store_rollouts_controller.rb (1)

49-49: LGTM! Correctly signals manual halt to disable automatic rollout.

Passing manually: true properly indicates user-initiated halting, which triggers the coordinator to disable automatic rollout (as shown in app/libs/coordinators.rb lines 260-271). This prevents the automatic system from conflicting with manual intervention.

app/views/config/release_platforms/_form.html.erb (1)

95-99: LGTM! Clear UI for automatic rollout configuration.

The automatic rollout toggle is well-positioned within the Android rollout section with a descriptive info icon that clearly explains the feature's behavior. The UI properly communicates that Tramline will automatically increase rollouts daily based on release health rules.

app/libs/coordinators.rb (1)

261-272: LGTM! Clean separation of manual vs. automatic halt behavior.

The addition of the manually parameter effectively prevents automatic rollouts from resuming after a user manually halts them. The integration with disable_automatic_rollout! is appropriate.

app/models/config/submission.rb (1)

7-7: LGTM! Clean configuration field addition.

The automatic_rollout field is properly integrated into the JSON serialization/deserialization flow and aligns with the database schema.

Also applies to: 53-54, 83-83

app/controllers/config/release_platforms_controller.rb (1)

85-87: LGTM! Proper parameter handling and cascading disable logic.

The automatic_rollout parameter is correctly whitelisted and the cascading disable behavior (line 159) ensures consistency when staged rollouts are turned off.

Also applies to: 159-159

spec/models/production_release_spec.rb (1)

248-254: LGTM! Appropriate test update for new behavior.

The test correctly verifies that HaltUnhealthyReleaseRolloutJob is enqueued when health data is fetched, and the refactoring to receive_messages improves readability.

app/models/release_platform_run.rb (1)

349-349: LGTM! Clean predicate with safe navigation.

The method safely traverses the configuration chain and provides a sensible default. The implementation addresses the past review comment by reading directly from the conf object.

app/jobs/increase_healthy_release_rollout_job.rb (1)

11-13: Verify: Result from increase action is not checked.

The job calls Action.increase_the_store_rollout!(rollout) on line 12 but does not check whether the action succeeded. If the increase fails (e.g., due to API errors), the job will still update timestamps and reschedule itself (lines 16-21), potentially leading to repeated failures without visibility.

Consider checking the result and handling failures appropriately:

     if release.healthy? && (rollout.started? || rollout.halted?)
-      Action.increase_the_store_rollout!(rollout)
+      result = Action.increase_the_store_rollout!(rollout)
+      return unless result.ok?
     end

Alternatively, if the current behavior is intentional (to keep trying despite failures), please confirm this is the desired retry strategy.

spec/jobs/halt_unhealthy_release_rollout_job_spec.rb (1)

1-63: Good test coverage for the halt unhealthy release rollout job.

The tests properly cover the key scenarios: automatic rollout enabled/disabled combined with healthy/unhealthy release states. The expectations correctly verify that halt_release is only called when both conditions are met (automatic rollout enabled AND release unhealthy).

app/models/store_rollout.rb (1)

53-53: Scope looks correct for identifying active automatic rollouts.

The automatic_rollouts scope correctly filters for rollouts that are in progress (started or halted), are staged rollouts, and have automatic rollout enabled. This aligns with the job logic that processes these rollouts.

spec/jobs/verify_automatic_rollout_job_spec.rb (1)

32-39: Test correctly verifies the eligibility criteria for automatic rollout processing.

The test sets up a rollout where:

  • automatic_rollout_next_update_at is in the past (10.minutes.ago)
  • The interval between updated_at and next_update_at exceeds 300 seconds (24 hours vs 10 minutes)

This correctly triggers the IncreaseHealthyReleaseRolloutJob enqueue as expected.

spec/jobs/increase_healthy_release_rollout_job_spec.rb (1)

1-122: Comprehensive test coverage for the automatic rollout progression job.

The tests properly cover the key scenarios including rollout state transitions, health checks, scheduling behavior, and guards for non-applicable rollouts. The use of freeze_time for timestamp assertions is appropriate.

app/models/play_store_rollout.rb (3)

166-175: The pause_release! method implementation looks correct.

The method properly:

  1. Guards with may_pause?
  2. Disables automatic rollout
  3. Transitions state to paused
  4. Creates an event stamp
  5. Sends notification

One minor observation: the notification message "Automatic rollout has been paused" assumes the pause was due to automatic rollout, but this method could potentially be called for manual pauses too. Consider if the message should be more generic.


37-37: The 24-hour interval constant is well-defined.

AUTO_ROLLOUT_RUN_INTERVAL = 24.hours clearly expresses the automatic rollout cadence and is used consistently across the related jobs.


43-55: State machine transitions for pause/resume are correctly defined.

The additions properly model the pause/resume flow:

  • start can now transition from paused state (in addition to created and halted)
  • New pause event transitions from started to paused
  • New resume event transitions from paused to started
db/schema.rb (1)

1-13: Schema changes do not contain sensitive data; no anonymize.rake updates needed.

The new columns added to store_rollouts (automatic_rollout, automatic_rollout_updated_at, automatic_rollout_next_update_at) and submission_configs (automatic_rollout) are operational configuration flags and scheduling timestamps used by automated rollout jobs. These columns contain no sensitive user, personal, or business data—they track system-level rollout automation state only. The anonymize.rake file already correctly handles these tables with a whitelist strategy that excludes non-sensitive operational fields, which is the appropriate behavior.

Likely an incorrect or invalid review comment.

def perform(production_release_id)
production_release = ProductionRelease.find(production_release_id)

if production_release.unhealthy?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find ProductionRelease model
fd -t f "production_release\.rb|production_releases\.rb" -type f

Repository: tramlinehq/tramline

Length of output: 233


🏁 Script executed:

#!/bin/bash
# Search for healthy? and unhealthy? method definitions in ProductionRelease
ast-grep --pattern 'def healthy?
  $$$
'

ast-grep --pattern 'def unhealthy?
  $$$
'

Repository: tramlinehq/tramline

Length of output: 1701


🏁 Script executed:

#!/bin/bash
# Also check for IncreaseHealthyReleaseRolloutJob
fd -t f "increase_healthy.*job" -type f

Repository: tramlinehq/tramline

Length of output: 233


🏁 Script executed:

#!/bin/bash
# Find all jobs to understand the context better
fd -t f ".*_job\.rb" app/jobs/ | head -20

Repository: tramlinehq/tramline

Length of output: 861


🏁 Script executed:

#!/bin/bash
# Read the halt_unhealthy_release_rollout_job.rb file
cat -n app/jobs/halt_unhealthy_release_rollout_job.rb

Repository: tramlinehq/tramline

Length of output: 832


🏁 Script executed:

#!/bin/bash
# Read the increase_healthy_release_rollout_job.rb file
cat -n app/jobs/increase_healthy_release_rollout_job.rb

Repository: tramlinehq/tramline

Length of output: 1054


🏁 Script executed:

#!/bin/bash
# Check Release model for healthy? method
rg -n "def healthy\?" app/models/release.rb -A 10

Repository: tramlinehq/tramline

Length of output: 45


🏁 Script executed:

#!/bin/bash
# Check the relationship between Release and ProductionRelease
rg -n "production_release" app/models/release.rb -B 2 -A 2 | head -40

Repository: tramlinehq/tramline

Length of output: 1057


🏁 Script executed:

#!/bin/bash
# Broader search for healthy? in the Release model file
cat -n app/models/release.rb | grep -C 5 "healthy"

Repository: tramlinehq/tramline

Length of output: 1102


🏁 Script executed:

#!/bin/bash
# Search for any healthy? method definitions in the entire codebase
rg -n "^\s*def healthy\?" app/models/ -B 2

Repository: tramlinehq/tramline

Length of output: 339


Define Release#healthy? method or use inverse predicate in IncreaseHealthyReleaseRolloutJob.

The Release model lacks a healthy? method, but IncreaseHealthyReleaseRolloutJob calls release.healthy? on line 11. The Release model only defines unhealthy? (which checks if any platform run is unhealthy). Add Release#healthy? as the inverse of unhealthy?, or refactor the job to use !release.unhealthy? for consistency with the predicate pattern used elsewhere.

🤖 Prompt for AI Agents
In app/jobs/halt_unhealthy_release_rollout_job.rb around line 7, the job and
IncreaseHealthyReleaseRolloutJob expect Release#healthy? to exist but the model
only implements unhealthy?; add a Release#healthy? instance method that returns
the inverse of unhealthy? (e.g., !unhealthy?) so both predicates are available
and consistent, or alternatively update the jobs to use !release.unhealthy?
everywhere; implement the method in the Release model to return !unhealthy? and
update tests if any rely on healthy?.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
spec/jobs/increase_healthy_release_rollout_job_spec.rb (1)

28-28: Verify that ProductionRelease#healthy? is implemented.

The tests mock ProductionRelease#healthy? (lines 28, 51, 73), and the job implementation calls release.healthy? (per relevant code snippet at app/jobs/increase_healthy_release_rollout_job.rb:11). However, based on past review comments and the provided code snippets showing only unhealthy? defined as !healthy?, there's a risk that the healthy? method may not be implemented, which would cause a NoMethodError or infinite recursion.

Run the following to confirm healthy? is defined:

#!/bin/bash
# Search for healthy? method definition in ProductionRelease
ast-grep --pattern 'class ProductionRelease
  $$$
  def healthy?
    $$$
  end
  $$$
'

# Also check if it might be defined in a concern or parent class
rg -n "def healthy\?" app/models/ -B 2 -A 5

Also applies to: 51-51, 73-73

🧹 Nitpick comments (1)
app/jobs/halt_unhealthy_release_rollout_job.rb (1)

4-18: Consider adding error handling for missing records.

The job will raise ActiveRecord::RecordNotFound if either the ProductionRelease or ReleaseHealthEvent is not found. While this may be acceptable for a high-priority job (failures will be visible), consider adding error handling if you want graceful degradation when records are deleted between enqueueing and execution.

Optional defensive approach:

 def perform(production_release_id, event_id)
-  production_release = ProductionRelease.find(production_release_id)
-  event = ReleaseHealthEvent.find(event_id)
+  production_release = ProductionRelease.find_by(id: production_release_id)
+  event = ReleaseHealthEvent.find_by(id: event_id)
+  return unless production_release && event

   if production_release.unhealthy?
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6b62771 and d649922.

📒 Files selected for processing (10)
  • app/jobs/halt_unhealthy_release_rollout_job.rb (1 hunks)
  • app/jobs/increase_healthy_release_rollout_job.rb (1 hunks)
  • app/models/release_health_event.rb (2 hunks)
  • app/models/store_submission.rb (1 hunks)
  • config/locales/passport/en.yml (1 hunks)
  • spec/factories/release_health_events.rb (1 hunks)
  • spec/jobs/halt_unhealthy_release_rollout_job_spec.rb (1 hunks)
  • spec/jobs/increase_healthy_release_rollout_job_spec.rb (1 hunks)
  • spec/models/play_store_submission_spec.rb (1 hunks)
  • spec/models/release_health_event_spec.rb (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • spec/jobs/halt_unhealthy_release_rollout_job_spec.rb
  • app/jobs/increase_healthy_release_rollout_job.rb
🧰 Additional context used
🧬 Code graph analysis (3)
spec/models/play_store_submission_spec.rb (1)
app/models/store_submission.rb (2)
  • auto_rollout? (72-78)
  • conf (173-173)
app/models/release_health_event.rb (3)
app/models/production_release.rb (1)
  • unhealthy? (154-156)
app/models/release.rb (1)
  • unhealthy? (257-259)
app/models/release_platform_run.rb (1)
  • unhealthy? (203-205)
spec/jobs/increase_healthy_release_rollout_job_spec.rb (1)
app/jobs/increase_healthy_release_rollout_job.rb (1)
  • perform (4-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: test
  • GitHub Check: lint
🔇 Additional comments (8)
config/locales/passport/en.yml (1)

79-82: LGTM! Clear messaging for automatic rollout states.

The addition of "automatic" to the resumed and paused messages improves clarity by distinguishing automatic rollout actions from manual ones. The terminology is consistent and correctly references Play Store.

app/jobs/halt_unhealthy_release_rollout_job.rb (1)

8-17: LGTM! Clean halt logic with appropriate checks.

The conditional checks ensure the halt action is only triggered for Play Store rollouts with automatic rollout enabled, and the event is properly updated on success. The use of result.ok? provides safe error handling for the halt action.

spec/factories/release_health_events.rb (1)

3-3: LGTM! Factory correctly updated for new association.

The change from deployment_run to production_release aligns with the model's association changes.

spec/models/release_health_event_spec.rb (1)

1-33: LGTM! Comprehensive test coverage for the callback.

The tests properly verify that:

  • Unhealthy events trigger HaltUnhealthyReleaseRolloutJob with correct arguments
  • Healthy events do not trigger the job

The test structure is clear and uses appropriate RSpec patterns.

spec/models/play_store_submission_spec.rb (1)

209-260: LGTM! Thorough test coverage for auto_rollout? logic.

The tests comprehensively cover all branches of the auto_rollout? method:

  • Production releases with both flags enabled (true case)
  • Production releases with either flag disabled (false cases)
  • Non-production releases (always false)

The use of mocking to control configuration values is appropriate and keeps tests focused.

app/models/release_health_event.rb (1)

33-33: LGTM! Appropriate callback for job enqueueing.

The use of after_create_commit is correct for enqueueing background jobs, ensuring the job is only enqueued after the transaction commits. The guard clause properly restricts the trigger to unhealthy events only.

Also applies to: 64-67

app/models/store_submission.rb (1)

72-78: LGTM! Correctly implements automatic rollout conditions.

The refactored logic properly restricts automatic rollouts to production releases that have both staged rollout and automatic rollout enabled. The two-branch structure (if/else) correctly addresses the past review feedback.

spec/jobs/increase_healthy_release_rollout_job_spec.rb (1)

1-135: LGTM! Comprehensive test coverage for automatic rollout scenarios.

The test file thoroughly covers:

  • Different rollout states (created, started, halted, completed)
  • Health status variations (healthy/unhealthy)
  • Configuration flags (automatic_rollout enabled/disabled, staged/non-staged)
  • Time-based scheduling verification (24-hour intervals)
  • State transitions and API call expectations

The test structure is clear, well-organized, and uses appropriate mocking strategies.

@kitallis kitallis requested a review from nid90 December 2, 2025 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants