Policy does not trigger automatic software install if host was removed from software scope then re-added (using labels) #25071

jmwatts · 2024-12-31T19:59:11Z

Fleet version: v4.62.0

Web browser and operating system: Chrome 131.0.6778.205 on macOS

💥 Actual behavior

Policy runs and fails but software install is not triggered - there is no pending install for the software title queued in the Upcoming activity

🧑‍💻 Steps to reproduce

Add a label to an existing macOS host (ex: Testing scope)
Go to Software >> Add software
Choose either Fleet-maintained or Custom package
Choose "Automatic" for "Install" method
Choose "Custom" for "Target" and select "Include any"
Select label from step 1
Click Add software
On the host detail page, view "Upcoming" activity for the host after clicking "Refetch"
Once the vitals have been refetched and you see a pending install for your software title queued in the Upcoming activity section, QUICKLY change the scope for the software title to "Exclude any" and the label from step 1 (need to do this before the installation starts on the host device)
Confirm the pending install action has been removed from the Upcoming activity and the software is not showing as a completed install in "Past" activity
Change the scope for the software title back to "Include any" and the label from step 1, save
Refetch vitals again for the policy to run again. Can also trigger the policy to run from Policies >> Run.

🕯️ Expected results

Now that the host is back within the scope of the software target via label, the failed policy should trigger a software install on the host.

The text was updated successfully, but these errors were encountered:

jmwatts · 2025-01-02T19:35:35Z

Note:
While testing other auto-install workflows I discovered if you remove the software after it's installed but before the policy has run again and passed it will never re-issue an install command. That workflow looks like this:

Install software triggered by policy
Software is installed on host
Policy is still marked as failed from previous refetch
do NOT refetch
delete software from host (as end user)
Refetch

When you do those steps, the software won't be reinstalled. I think this may be similar because the policy runs once and fails, identifies the software is not installed, and issues the install command. Because we cancel that command as soon as we change the scope, when the policy fails again, it doesn't re-issue the install command.

I see in our doc Automatic software install in fleet it says:
"Fleet will send install requests to the hosts on the first policy failure (first "No" result for the host) or if a policy goes from "Yes" to "No". On this iteration it will not send an install request if a policy is already failing and continues to fail ("No" -> "No"). See the following flowchart for details."
I don't see a ticket for another iteration to make this work for automatic software installs. Is there one somewhere? @noahtalerman
I feel there should definitely be one, because this is a state where software should be installed on a host but it won't be depending on timing, and in this case, a change to scope.

iansltx · 2025-01-03T17:27:16Z

So, single-attempt installs were what was originally spec'd, and I don't think we want to change that behavior. However as future work (out of scope for this ticket) we could (likely should) provide enough information to orbit to indicate which policy an install was for, so orbit can rerun that policy's query and report back the result after a successful install (we can't just use the install success as a proxy for policy pass because the query may be looking for multiple things). Probably same with policy-initiated script runs, though that can be separate work, likely at lower priority. Idea being that we can remove the one-hour window where someone could uninstall successfully-installed policy-initiated software and then never get an install attempt until the installer or policy was modified.

With that out of the way, the current issue is that bringing hosts into scope via labels for an automation doesn't reset the policy status for those hosts, and it should, since "we brought the host in scope for a policy-automated install by changing labels for the installer" is comparable to "we brought the host in scope for a policy-automated install by adding an install automation to the policy", just for hosts in that team in (or outside) that label, rather than for hosts merely in that team.

We don't need to clear stats when a host goes out of scope via labels, as no action is required there. We do when the host comes back in scope.

ProcessInstallerSideEffects and cleanupPolicy are the places where this logic currently exists for installer changes that affect the whole team, so that would be the starting point here. Not sure whether we have fine-grained enough data structures to handle this operation correctly only for hosts affected by label changes though.

iansltx · 2025-01-03T18:47:44Z

Per discussions earlier, what we need to do here is, on installer label scope changes:

Calculate which hosts are newly covered by the revised scope (we want to diff so we aren't clearing policy statuses unnecessarily)
Check which policies include those hosts AND have the installer as an automation
Clear policy stats for (2)
Clear host policy status for hosts from (1) X policies from (2); for consistency with behavior when adding/modifying an installer automation overall we should clear both successful and failed policy statuses

The stats/aggregations cron will repopulate policy stats, and host policy check-ins will repopulate status information, with the opportunity for hosts to go from blank to failed, triggering the automation again, solving this bug.

jmwatts added bug Something isn't working as documented :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. #g-mdm MDM product group ~unreleased bug This bug was found in an unreleased version of Fleet. :incoming New issue in triage process. labels Dec 31, 2024

jmwatts added this to the 4.62.0-tentative milestone Dec 31, 2024

jmwatts mentioned this issue Dec 31, 2024

Scope Fleet-maintained apps and custom packages via labels #22813

Open

63 tasks

mostlikelee added #g-software Software product group and removed #g-mdm MDM product group :incoming New issue in triage process. labels Jan 3, 2025

mostlikelee assigned jahzielv Jan 3, 2025

iansltx mentioned this issue Jan 3, 2025

Should policy runs for hosts that are not within scope of associated software automation? #25066

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Policy does not trigger automatic software install if host was removed from software scope then re-added (using labels) #25071

Policy does not trigger automatic software install if host was removed from software scope then re-added (using labels) #25071

jmwatts commented Dec 31, 2024

jmwatts commented Jan 2, 2025

iansltx commented Jan 3, 2025

iansltx commented Jan 3, 2025

Policy does not trigger automatic software install if host was removed from software scope then re-added (using labels) #25071

Policy does not trigger automatic software install if host was removed from software scope then re-added (using labels) #25071

Comments

jmwatts commented Dec 31, 2024

💥 Actual behavior

🧑‍💻 Steps to reproduce

🕯️ Expected results

jmwatts commented Jan 2, 2025

iansltx commented Jan 3, 2025

iansltx commented Jan 3, 2025