Skip to content

Windows: Retry configuration profiles 3 times and log failures in host activity #42981

@getvictor

Description

@getvictor

Goal

User story
As an IT admin managing Windows hosts,
I want Windows configuration profiles to retry up to 3 times before being marked as failed, and to see failed attempts logged in host activity records
so that I can have confidence profiles will be applied reliably and have visibility into failures when they occur.

Changes

Product

  • UI changes: (1) Windows profiles should remain in "pending" status during retries instead of immediately showing "failed." (2) The Host Details > Activity tab should display entries when a Windows configuration profile fails to install. Each activity entry should include the profile name and relevant error details. This brings Windows to parity with Apple profile behavior (see Apple (macOS, iOS, iPadOS) configuration profiles: Retry 3 times #42327).
  • CLI (fleetctl) usage changes: No changes
  • YAML changes: No changes
  • REST API changes: No changes
  • Fleet's agent (fleetd) changes: No changes
  • Fleet server configuration changes: No changes
  • Exposed, public API endpoint changes: No changes
  • fleetdm.com changes: No changes
  • GitOps mode UI changes: No changes
  • GitOps generation changes: No changes
  • Activity changes: A new activity type should be added for Windows profile installation failures (e.g., "failed_windows_profile"). Each failed attempt should appear on the Host Details activity feed, including during retries, so IT admins have full visibility into the retry process.
  • Permissions changes: No changes
  • Changes to paid features or tiers: Fleet Premium
  • My device and fleetdm.com/better changes: No changes
  • Usage statistics: No changes
  • Other reference documentation changes: No changes
  • First draft of test plan added
  • Once shipped, requester has been notified
  • Once shipped, dogfooding issue has been filed

Engineering

  • Test plan is finalized
  • Contributor API changes: No changes
  • Feature guide changes: No changes
  • Database schema migrations: (1) If MaxProfileRetries is shared between Apple and Windows, split it into MaxAppleProfileRetries and MaxWindowsProfileRetries so each platform can be tuned independently. Set MaxWindowsProfileRetries to 3. (2) Verify the retries column on host_mdm_windows_profiles supports values up to 3. (3) May require a migration to add an activity type for Windows profile failures if one does not already exist.
  • Load testing: No changes
  • Pre-QA load test: No changes
  • Load testing/osquery-perf improvements: No changes
  • This is a premium only feature: Yes

ℹ️ Please read this issue carefully and understand it. Pay special attention to UI wireframes, especially "dev notes".

Risk assessment

  • Risk level: Low
  • The retry change is a constant update (matching Apple's existing behavior). The activity logging adds new records without modifying existing profile deployment logic.

Test plan

Make sure to go through the list and consider all events that might be related to this story, so we catch edge cases earlier.

Core flow -- retries

  • Deploy a Windows configuration profile to a host that will fail (e.g., a malformed profile or a profile targeting an unsupported CSP)
  • Verify that the profile stays in "pending" status during retries (not immediately marked "failed")
  • Verify that Fleet retries the profile installation up to 3 times (1 initial attempt + 3 retries = 4 total attempts)
  • Verify that the profile is marked "failed" only after all 4 attempts are exhausted
  • Verify that manually resending a profile resets the retry counter

Core flow -- activity logging

  • Verify that each failed attempt (including retries) creates a host activity entry on the Host Details page
  • Verify that the activity entry includes the profile name and relevant error details
  • Verify that a successful profile installation after prior failures does not produce a failure activity entry

UI

  • Verify that the Host Details > Activity tab displays Windows profile failure entries with the correct information
  • Verify expected UI states (loading, empty, error states if applicable)

API

  • Test that the activities API endpoint returns Windows profile failure activities for the host
  • Verify error handling for invalid inputs where applicable

Permissions

  • Verify role restrictions are applied correctly for global roles
  • Verify role restrictions are applied correctly for fleet-level roles

Edge cases

  • A profile that fails on the first attempt but succeeds on a retry should only log failure activities for the failed attempt(s), not the successful one
  • Multiple profiles failing simultaneously on the same host should each produce their own activity entries and retry independently
  • A host that is re-enrolled should not duplicate old failure activity entries
  • Profiles stay in "pending" longer for genuinely broken profiles -- verify the timing is acceptable (similar to Apple: ~2 extra osquery detail cycles for verification path, ~1 minute for MDM command failure path)

Supplemental testing

Testing notes

Confirmation

  1. Engineer: Added comment to user story confirming successful completion of test plan (include any special setup, test data, or configuration used during development/testing if applicable).
  2. QA: Added comment to user story confirming successful completion of test plan.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions